Thanks everyone. These are quite a few pointers, I will spend some time digesting it all.
So there are really two approaches, large complex kernels on one hand and AVX2/AVX/FMA on the other, or a combination of the two. I guess I should propose identifying and implementing larger complex kernels and then further accelerating using AVX2/FMA etc. Doing both will of course limit the number of applications/algorithms I can feasibly target. What's your take on this ? Abhishek On Wed, Feb 26, 2014 at 5:03 AM, West, Nathan <n...@ostatemail.okstate.edu> wrote: > On Tue, Feb 25, 2014 at 4:37 PM, West, Nathan > <n...@ostatemail.okstate.edu> wrote: >>> > On Sun, 2/23/14, Abhishek Bhowmick <abhowmic...@gmail.com> >>> wrote: >>> > >>> > Subject: [Discuss-gnuradio] Google Summer of Code >>> 2014 applicant : Optimization with VOLK >>> > To: discuss-gnuradio@gnu.org >>> > Date: Sunday, February 23, 2014, 8:52 AM >>> > >>> > Hello, >>> > I have completed a Bachelor's degree in >>> > Electrical Engineering from IIT Bombay, India and >>> will be >>> > joining a masters program in Computer Science in >>> August. For >>> > the summer, I am interested in participating GSoC >>> 2014 and >>> > GNU Radio is an organization wheAbhishekre my background >>> fits >>> > nicely. >>> > >>> >> > -------------------------------------------- >> >>> > I went through the ideas page and was >>> > particularly interested in doing performance >>> optimization >>> > with VOLK. After going through some online >>> documentation >>> > about the library and the SDR'12 paper, I >>> realised that >>> > following areas need work : >>> > >>> > 1. Profiling GNU radio code to identify new >>> > kernels and implement them for existing Intel >>> SIMD >>> > extensions, also porting kernels to other ISA >>> extensions. >>> > 2. Better testing of the effects of more complex >>> > scheduler logic on larger environments (beyond >>> simple >>> > kernels) >>> > >>> > 3. Exploring extension of Volk to GPU ISAs, to >>> > leverage chips such as AMD Fusion (However, this >>> seems to >>> > more research than software development) >>> > >>> > According to the GSoC proposal, point (1) seems >>> > to be the expectation. Given this, I would like >>> some advice >>> > on how to go ahead looking for potential ideas >>> (and some >>> > feedback on feasibility of the other ideas as >>> well) >>> > >>> > >>> > My background : C++, Python, Signal Processing, >>> > Computer Architecture >>> > >>> > Thanks, >>> > Abhishek Bhowmick >>> > >> >> >> This is a great conversation, and I'll take the opportunity to plug >> the up coming VOLK working group call >> (https://plus.google.com/u/1/events/ch3jrjcvp7mdiqelpismfieg3n0). >> Bogdan, your results aren't particula> > >> -------------------------------------------- >> rly surprising, but the feedback is really good to hear. >> >> Back to GSoC: >> >> Abhishek, >> >>>Thanks for the pointers to gr-atsc and gr-80211. I have started >>>looking there as a >>>starting point. Are there similar modules which are undergoing volk >>>speedup fixes? >>>I am also trying to meet up with other people who have been using GNU radio >>>to identify potential modules for acceleration. As you are now a >>>mentor organization, I feel it's a good time for us to get into >>>detailed discussions. >> >> From the previous discussion it should be apparent that how algorithms >> are implemented will make the biggest difference, and that the new >> acceleration is primarily going to come from larger more complex >> kernels. At the end of the day it's going to be your proposal. So far >> on the list of places to look we have >> >> * in-tree OFDM (contact Martin) >> * gr-atsc (use Andrew Davis' fork) >> * gr-dvbt >> * gr-fecapi >> >> For your proposal I would recommend looking at their code, then >> getting in contact with the author(s) of those modules to ask about >> their thoughts on accelerating blocks they have written. The reality >> of this project is that we are accelerating some signal processing >> algorithm and knowledge of that algorithm is useful for acceleration. >> Whatever application you have interested and/or knowledge in (fresh >> out of a BS it's more likely to be interest) should guide your >> proposal. If you know anything about error correcting codes then the >> latter 2 would be good fits. OFDM frame detection probably has a >> gentler learning curve since at the basic level you're looking at >> convolution, and there's papers you can look for on more involved >> algorithms. Other algorithms to look at might include agc or >> equalizers. >> >> If you're interested in GPU programming don't forget to checkout gr-gpu. >> >>> >>>> >>>> At the moment the only mainstream ISA not being targeted is probably >>>> AVX2, which has >>>> some nice features for the type of kernels we're doing. If you went >>>> that route it would likely need add >>>> protokernels to a pretty large number of kernels. >>>> >>>> Nathan >>> >>>This also seems to be promising, though I guess it would require me to >>>come up to speed with AVX2 (which I would love to do). Could you >>>please elaborate >>>a little on the kind of beneficial features you have in mind ? I am >>>concerned that the >>>job of adding proto-kernels might turn out to be mundane/tedious ? Is >>>that a valid concern ? >> >> Right, so as Martin mentioned the answer is sort of relative. I >> wouldn't go so far as to say it's mundane, especially if you have >> little >> experienhttp://gnss-sdr.org/documentation/google-summer-code-2014-ideas-listce >> with using intrinsics and SIMD instructions. One >> reason AVX isn't so prominently featured (I suspect) is that the >> instructions are almost the same as SSE instructions, but the vectors >> are twice as long so that is actually mundane. AVX2/FMA extensions >> introduce some new features to the amd64 instruction set. The most >> obvious being that it looks like Intel and AMD finally settled in on >> the same fused multiply-add (there's also a multiply-subtract that's >> good for complex numbers) implementation. That will likely be able to >> speed things up a bit, but I'm also looking forward to seeing gains >> from the various load_gathers that have been introduced. They allow >> you to do a single load operation that gathers vector elements that >> span pretty large ranges. VOLK won't be so interested in the large >> ranges (except maybe decimators), but it could be useful for loading >> complex vectors. There's some other math functions we may be able to >> leverage, but those are two features that I think would be widely >> applicable. >> >> In your proposal you should definitely include what ISAs you intend to >> use, and if there are features specific to that instruction set then >> point out why it's a good choice. This is mostly important for >> choosing between SSE and friends, AVX, AVX2/FMA. It would be good to >> see plans that include NEON support for anything you'd add to amd64 >> platforms, but that's not a requirement. >> >> >> Nathan > > I also see that GNSS-SDR made it to GSoC and they have a VOLK related project. > http://gnss-sdr.org/documentation/google-summer-code-2014-ideas-list Yeah, I also noticed that. I might submit a proposal to them also. Abhishek _______________________________________________ Discuss-gnuradio mailing list Discuss-gnuradio@gnu.org https://lists.gnu.org/mailman/listinfo/discuss-gnuradio