Thanks Paul.
----- Original Message ----- From: "Paul Ho" <[EMAIL PROTECTED]> To: <amibroker@yahoogroups.com> Sent: Wednesday, August 06, 2008 9:16 AM Subject: [amibroker] Re: Freakishly fast backtest using 64 cores > Click on the individual chipset, you'll get the manufacturers that > are using those chipset. > --- In amibroker@yahoogroups.com, "Paul Ho" <[EMAIL PROTECTED]> wrote: >> >> http://www.nvidia.com/object/cuda_learn_products.html >> >> >> _____ >> >> From: amibroker@yahoogroups.com [mailto:[EMAIL PROTECTED] > On Behalf >> Of cstrader >> Sent: Wednesday, 6 August 2008 10:51 PM >> To: amibroker@yahoogroups.com >> Subject: Re: [amibroker] Re: Freakishly fast backtest using 64 cores >> >> >> >> Which video cards provide this feature? As far as I can tell, it's > only the >> 8-Series (G8X) GPU from NVIDIA, found in the GeForce, Quadro and > Tesla >> lines. Who will have one of these? Are many people likely to have > them in >> the future? >> >> Thanks >> >> ----- Original Message ----- >> From: "dloyer123" <[EMAIL PROTECTED] <mailto:dloyer123%40yahoo.com> > com> >> To: <[EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> > ps.com> >> Sent: Wednesday, August 06, 2008 12:22 AM >> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores >> >> > Very good question. That was a head scratcher. >> > >> > So the thing is, AmiBroker does a lot more work in a optimization >> > pass then execute AFL code. In fact, the AFL code may take very >> > little of the total run time. >> > >> > As an example, using a database with good amount of data, write a > afl >> > file that does nothing buy set the buy/sell/short/cover arrays to > 0. >> > The backtest will still take a good bit of time. >> > >> > So, even reducing the AFL run time to zero is not enough. It will >> > not help much at all. >> > >> > So, to avoid this, I pass a "mode" variable to my Dll. This mode > is >> > set by a simple optimization statement: >> > >> > mode = optimize("mode",0,1,3,1); >> > >> > When mode = 0, the dll will evaluate one symbol like a normal dll. >> > So if I click on a bar, it will update my printf statements, etc. >> > buy/sell/short/cover arrays are set. A single backtest (not >> > optimize) will use the normal AmiBroker trade match and evaluate > code >> > and generate stats as normal. >> > >> > When mode = 1, this means load the data. The Dll will copy the > price >> > data to a stage area in memory. buy/sell/short/cover are set to 0 > to >> > generate no trades. Having AmiBroker align the symbol bars was a > big >> > help here. >> > >> > When mode = 2, on the first symbol and the first symbol only, it >> > loads the price data to the video card and executes as many > backtest >> > passes as it needs at a few ms per pass. Once the best combination >> > is found it returns. buy/sell/short/cover are set to 0. Note that > I >> > can not use the Amibroker signal match and fitness function code. > I >> > have to provide my own. This is where the performance advantage of >> > all of the extra cores come into play. It may run hundreds or >> > thousands of parameter combinations very quickly. I cant use the >> > built in optimize suppport, but brute force is enough for now. > After >> > all, I get 200 combinations per second. >> > >> > When mode = 3, each symbol evaluates using the best parms found on >> > the last mode=2 run. buy/sell/short/cover are set. In a > walkforward >> > test, this will always have the best score and be used for the >> > walkforward step. A custom backtest function adds the chosen >> > parameters to the backtest report. Mode 3 works like mode 0 except >> > it uses the optimal parameters rather than defualt values. >> > >> > The action("status") and action("statusex") codes could also be > used, >> > but they did not tell me quite what I needed to know. Also, I > could >> > have avoided the mode=2 step if I could find a way to know I was > on >> > the last symbol and run the optimization then. I guess I could > pass >> > the name of the last symbol. >> > >> > So I use AmiBroker to load and keep the datbase, visualize the >> > trades, validate, walkforward and provide deep metrics of the >> > backtest. >> > >> > If I wanted to take this further, I would move the trade system > logic >> > out of the Dll and make it programable from Afl. That way it could >> > be used by anyone without needing to program C. I would do this by >> > passing handles to cuda arrays through the Afl code. >> > >> > >> > >> > >> > --- In [EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> > ps.com, >> "Paul Ho" <paul.tsho@> wrote: >> >> >> >> thanks for your insight. >> >> I hope you dont mind sharing a little bit more detail >> >> You said " >> >> Get get the best performance, my AFL code makes one pass over the >> >> > data, calling a Dll. The Dll takes all of the data needed by > the >> >> > calculation and loads a copy to the video card. This upload is >> >> slow, >> >> > the entire upload takes about 45 seconds for all 1000 symbols. >> >> > >> >> > Once all of the data is uploaded, the Dll loads a "kernel" into >> > the >> >> > graphics cores that perform the actual computation and > generates >> >> the >> >> > trade list. >> >> >> >> normally AB loads the data from database as needed, and calls a >> >> function in a dll, and passes data in arrays or whatever as >> > arguments >> >> of the function. The function will be called for every ticker in >> > the >> >> watchlist, and data pertaining that symbol is passed each time. I >> >> wonder how you do a "single pass" over the data. Because AB > passes >> >> the data as part of the argument regardless of how many >> > optimizations >> >> It had previously with the same data. I just wonder you do it. >> >> cheers >> >> Paul. >> >> >> >> --- In [EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> > ps.com, >> "dloyer123" <dloyer123@> wrote: >> >> > >> >> > This uses the mid range video card that happened to come with > my >> >> > system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times >> >> > faster. The 260 can be found at best buy for $300. Some laptops >> >> > have compatible cards as well. >> >> > >> >> > The video card has its own memory, mine has 512MB, some have as >> >> much >> >> > as 1GB. This memory is very fast, once it is loaded from the >> > main >> >> > system. Nvidia has a professional line of products that have >> > much >> >> > more memory. >> >> > >> >> > Get get the best performance, my AFL code makes one pass over > the >> >> > data, calling a Dll. The Dll takes all of the data needed by > the >> >> > calculation and loads a copy to the video card. This upload is >> >> slow, >> >> > the entire upload takes about 45 seconds for all 1000 symbols. >> >> > >> >> > Once all of the data is uploaded, the Dll loads a "kernel" into >> > the >> >> > graphics cores that perform the actual computation and > generates >> >> the >> >> > trade list. This part is very fast and performs all of the same >> >> > functions that my AFL version does. The resulting trade list is >> >> the >> >> > same. >> >> > >> >> > Because the data loaded into video memory, it can be resused > for >> >> many >> >> > passes over the data with different optimization values. So, >> >> > hundreds of combinations of optimization values can be tried > per >> >> > second. >> >> > >> >> > For non optimization runs, the Dll just loads one symbol into >> > video >> >> > memory and processes it. Counting the overhead of moving data > to >> >> the >> >> > video card and extracting the trade list for a single symbol, > the >> >> > result is similar to AFL code alone. This lets me test the code >> >> and >> >> > make sure it is correct. >> >> > >> >> > This approach works best when the data only needs to be loaded >> >> once, >> >> > then "resused" many times. It also works best when there is a >> > lot >> >> of >> >> > data to work with. >> >> > >> >> > What is more interesting to me and what would be more useful > for >> >> > others would be a general drive that requires no Dll changes to >> >> > modify the system. The performance would not be as good as hand >> >> > optimized code, but would still be much better than AFL code >> >> alone. >> >> > It would take trading system design to a whole new level. It >> > would >> >> > provide enough performance to make working with Intra day data > as >> >> > easy as daily data is today. >> >> > >> >> > Writing such a driver would be hard, but I have already done > some >> >> > prototypes and design work. I am tempted to do it for my own >> > use. >> >> > If I made it available to others supporting it would be a PITA. >> >> > >> >> > >> >> > >> >> > >> >> > --- In [EMAIL PROTECTED] <mailto:amibroker% > 40yahoogroups.com> ps.com, >> "Paul Ho" <paul.tsho@> wrote: >> >> > > >> >> > > I'm very interested >> >> > > could you elaborate a bit more >> >> > > What model of Nvidia chipset are you using, and with how much >> >> > memory? >> >> > > Not sure exactly what you mean when you say >> >> > > It uses AmiBroker to load the symbol data and perform >> >> calculations >> >> > > that do not depend on the optimization parameters. Once > loaded >> >> into >> >> > > video memory, repeated passes can be made with different >> >> > parameters, >> >> > > avoiding any overhead. >> >> > > Can you give me some examples. I presume when your dll is >> > called. >> >> > AB passes >> >> > > one or more arrays of data belonging to 1 symbol, is that > true? >> >> > > Not sure exactly what the rest mean either. How many > functions >> >> are >> >> > you >> >> > > running in your dll, and what does each of the do? >> >> > > Great of you to share your insight. >> >> > > Cheers >> >> > > Paul. >> >> > > >> >> > > >> >> > > >> >> > > _____ >> >> > > >> >> > > From: [EMAIL PROTECTED] <mailto:amibroker% > 40yahoogroups.com> ps.com >> >> [mailto:[EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> > ps.com] >> >> > On Behalf >> >> > > Of dloyer123 >> >> > > Sent: Tuesday, 5 August 2008 9:19 AM >> >> > > To: [EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> > ps.com >> >> > > Subject: [amibroker] Freakishly fast backtest using 64 cores >> >> > > >> >> > > >> >> > > >> >> > > Greetings, >> >> > > >> >> > > I ported part of my AFL backtest code to a plugin, that takes >> >> > > advantage of the graphics math cores on the video card that > are >> >> > > normally used for 3d graphics. >> >> > > >> >> > > I was able to get a several thousand fold performance >> > improvement >> >> > > over AFL code alone. >> >> > > >> >> > > My goal was to reduce the 25 seconds AFL code alone uses for > a >> >> > single >> >> > > portfolio level back test to less than 1 second, allowing > multi >> >> day >> >> > > optimization and walkforward runs to complete in a more >> >> reasonable >> >> > > time, and also just to see how fast I could get it to run. >> >> > > >> >> > > The backtest runs over 1 year of 5 minute bars for about 1000 >> >> > > symbols. 1 year of data normally takes 25 seconds for > AmiBroker >> >> > > alone, or 18 seconds for 6 months of data. A typical >> > optimization >> >> > > run takes hundreds of these passes per walk forward step, >> > taking >> >> > > hours. >> >> > > >> >> > > Using the Nvidia CUDA API, running on my mid range video > card. >> > It >> >> > > was much faster. Much, much, much faster. How fast? >> >> > > >> >> > > It reduced the run time from 25s to... 4.4ms. That is more > than >> >> > > 200/s! >> >> > > >> >> > > I didnt believe the timing when I saw it at first. So, I put >> >> 1,000 >> >> > > runs in a loop and sure enough, it ran 1,000 iterations in >> > about >> >> 4 >> >> > > 1/2 seconds. This far exceeded my gaol or expectations. >> >> > > >> >> > > The resulting trade list matches that obtained by the AFL >> > version >> >> > of >> >> > > this code. >> >> > > >> >> > > I estimate that it is processing 32GB of bar data/sec. >> >> > > >> >> > > Getting this to work at peak performance was tricky. Most of >> > what >> >> I >> >> > > have learned about code optimization does not apply. >> >> > > >> >> > > It uses AmiBroker to load the symbol data and perform >> >> calculations >> >> > > that do not depend on the optimization parameters. Once > loaded >> >> into >> >> > > video memory, repeated passes can be made with different >> >> > parameters, >> >> > > avoiding any overhead. >> >> > > >> >> > > For non backtest/optimization runs, the code just evaluates > one >> >> > > symbol and passes the data back to AmiBroker >> > buy/sell/short/cover >> >> > > arrays, making it easy to test, validate and visualize the >> >> trades. >> >> > > There is very little performance gain in this case. >> >> > > >> >> > > There are problems, however. To run optimizations at peak >> > speed, >> >> I >> >> > > can not use AmiBroker to calculate the optimization goal >> >> function. >> >> > > So, I am in the process of writing code to match signals and >> >> > > calculate the portfolio fitness function. Once I do this, I >> > will >> >> be >> >> > > able to perform full optimizations and walk forwards at 3 >> > orders >> >> of >> >> > > magnitude faster than is possible with AmiBroker alone. >> >> > > >> >> > > Also, this is not general purpose code. Changing the system >> > code >> >> > > means changing a dll written in C. However, there is no > reason >> >> that >> >> > > this could not be made more general. >> >> > > >> >> > > I have made some prototypes of "Cuda" versions of basic AFL >> >> > > functions. The idea is to queue the function calls into a >> >> > definition >> >> > > executed by a micro kernel running on the graphics cores. The >> >> > result >> >> > > would be the ability to use the full power of the graphics >> > cores >> >> by >> >> > > modifying AFL code to use Cuda aware versions with no changes >> > to >> >> C >> >> > > code. It would be an interesting, but big project. >> >> > > >> >> > >> >> >> > >> > >> > >> > ------------------------------------ >> > >> > Please note that this group is for discussion between users only. >> > >> > To get support from AmiBroker please send an e-mail directly to >> > SUPPORT {at} amibroker.com >> > >> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG: >> > http://www.amibroke <http://www.amibroker.com/devlog/> > r.com/devlog/ >> > >> > For other support material please check also: >> > http://www.amibroke <http://www.amibroker.com/support.html> >> r.com/support.html >> > Yahoo! Groups Links >> > >> > >> > >> > > > > ------------------------------------ > > Please note that this group is for discussion between users only. > > To get support from AmiBroker please send an e-mail directly to > SUPPORT {at} amibroker.com > > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG: > http://www.amibroker.com/devlog/ > > For other support material please check also: > http://www.amibroker.com/support.html > Yahoo! Groups Links > > >