Thanks Paul.

----- Original Message ----- 
From: "Paul Ho" <[EMAIL PROTECTED]>
To: <amibroker@yahoogroups.com>
Sent: Wednesday, August 06, 2008 9:16 AM
Subject: [amibroker] Re: Freakishly fast backtest using 64 cores


> Click on the individual chipset, you'll get the manufacturers that 
> are using those chipset.
> --- In amibroker@yahoogroups.com, "Paul Ho" <[EMAIL PROTECTED]> wrote:
>>
>> http://www.nvidia.com/object/cuda_learn_products.html
>> 
>> 
>>   _____  
>> 
>> From: amibroker@yahoogroups.com [mailto:[EMAIL PROTECTED] 
> On Behalf
>> Of cstrader
>> Sent: Wednesday, 6 August 2008 10:51 PM
>> To: amibroker@yahoogroups.com
>> Subject: Re: [amibroker] Re: Freakishly fast backtest using 64 cores
>> 
>> 
>> 
>> Which video cards provide this feature? As far as I can tell, it's 
> only the 
>> 8-Series (G8X) GPU from NVIDIA, found in the GeForce, Quadro and 
> Tesla 
>> lines. Who will have one of these? Are many people likely to have 
> them in 
>> the future?
>> 
>> Thanks
>> 
>> ----- Original Message ----- 
>> From: "dloyer123" <[EMAIL PROTECTED] <mailto:dloyer123%40yahoo.com> 
> com>
>> To: <[EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> 
> ps.com>
>> Sent: Wednesday, August 06, 2008 12:22 AM
>> Subject: [amibroker] Re: Freakishly fast backtest using 64 cores
>> 
>> > Very good question. That was a head scratcher.
>> >
>> > So the thing is, AmiBroker does a lot more work in a optimization
>> > pass then execute AFL code. In fact, the AFL code may take very
>> > little of the total run time.
>> >
>> > As an example, using a database with good amount of data, write a 
> afl
>> > file that does nothing buy set the buy/sell/short/cover arrays to 
> 0.
>> > The backtest will still take a good bit of time.
>> >
>> > So, even reducing the AFL run time to zero is not enough. It will
>> > not help much at all.
>> >
>> > So, to avoid this, I pass a "mode" variable to my Dll. This mode 
> is
>> > set by a simple optimization statement:
>> >
>> > mode = optimize("mode",0,1,3,1);
>> >
>> > When mode = 0, the dll will evaluate one symbol like a normal dll.
>> > So if I click on a bar, it will update my printf statements, etc.
>> > buy/sell/short/cover arrays are set. A single backtest (not
>> > optimize) will use the normal AmiBroker trade match and evaluate 
> code
>> > and generate stats as normal.
>> >
>> > When mode = 1, this means load the data. The Dll will copy the 
> price
>> > data to a stage area in memory. buy/sell/short/cover are set to 0 
> to
>> > generate no trades. Having AmiBroker align the symbol bars was a 
> big
>> > help here.
>> >
>> > When mode = 2, on the first symbol and the first symbol only, it
>> > loads the price data to the video card and executes as many 
> backtest
>> > passes as it needs at a few ms per pass. Once the best combination
>> > is found it returns. buy/sell/short/cover are set to 0. Note that 
> I
>> > can not use the Amibroker signal match and fitness function code. 
> I
>> > have to provide my own. This is where the performance advantage of
>> > all of the extra cores come into play. It may run hundreds or
>> > thousands of parameter combinations very quickly. I cant use the
>> > built in optimize suppport, but brute force is enough for now. 
> After
>> > all, I get 200 combinations per second.
>> >
>> > When mode = 3, each symbol evaluates using the best parms found on
>> > the last mode=2 run. buy/sell/short/cover are set. In a 
> walkforward
>> > test, this will always have the best score and be used for the
>> > walkforward step. A custom backtest function adds the chosen
>> > parameters to the backtest report. Mode 3 works like mode 0 except
>> > it uses the optimal parameters rather than defualt values.
>> >
>> > The action("status") and action("statusex") codes could also be 
> used,
>> > but they did not tell me quite what I needed to know. Also, I 
> could
>> > have avoided the mode=2 step if I could find a way to know I was 
> on
>> > the last symbol and run the optimization then. I guess I could 
> pass
>> > the name of the last symbol.
>> >
>> > So I use AmiBroker to load and keep the datbase, visualize the
>> > trades, validate, walkforward and provide deep metrics of the
>> > backtest.
>> >
>> > If I wanted to take this further, I would move the trade system 
> logic
>> > out of the Dll and make it programable from Afl. That way it could
>> > be used by anyone without needing to program C. I would do this by
>> > passing handles to cuda arrays through the Afl code.
>> >
>> >
>> >
>> >
>> > --- In [EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> 
> ps.com,
>> "Paul Ho" <paul.tsho@> wrote:
>> >>
>> >> thanks for your insight.
>> >> I hope you dont mind sharing a little bit more detail
>> >> You said "
>> >> Get get the best performance, my AFL code makes one pass over the
>> >> > data, calling a Dll. The Dll takes all of the data needed by 
> the
>> >> > calculation and loads a copy to the video card. This upload is
>> >> slow,
>> >> > the entire upload takes about 45 seconds for all 1000 symbols.
>> >> >
>> >> > Once all of the data is uploaded, the Dll loads a "kernel" into
>> > the
>> >> > graphics cores that perform the actual computation and 
> generates
>> >> the
>> >> > trade list.
>> >>
>> >> normally AB loads the data from database as needed, and calls a
>> >> function in a dll, and passes data in arrays or whatever as
>> > arguments
>> >> of the function. The function will be called for every ticker in
>> > the
>> >> watchlist, and data pertaining that symbol is passed each time. I
>> >> wonder how you do a "single pass" over the data. Because AB 
> passes
>> >> the data as part of the argument regardless of how many
>> > optimizations
>> >> It had previously with the same data. I just wonder you do it.
>> >> cheers
>> >> Paul.
>> >>
>> >> --- In [EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> 
> ps.com,
>> "dloyer123" <dloyer123@> wrote:
>> >> >
>> >> > This uses the mid range video card that happened to come with 
> my
>> >> > system, a 9800GT. The newer 260 and 280 cards are 3 to 4 times
>> >> > faster. The 260 can be found at best buy for $300. Some laptops
>> >> > have compatible cards as well.
>> >> >
>> >> > The video card has its own memory, mine has 512MB, some have as
>> >> much
>> >> > as 1GB. This memory is very fast, once it is loaded from the
>> > main
>> >> > system. Nvidia has a professional line of products that have
>> > much
>> >> > more memory.
>> >> >
>> >> > Get get the best performance, my AFL code makes one pass over 
> the
>> >> > data, calling a Dll. The Dll takes all of the data needed by 
> the
>> >> > calculation and loads a copy to the video card. This upload is
>> >> slow,
>> >> > the entire upload takes about 45 seconds for all 1000 symbols.
>> >> >
>> >> > Once all of the data is uploaded, the Dll loads a "kernel" into
>> > the
>> >> > graphics cores that perform the actual computation and 
> generates
>> >> the
>> >> > trade list. This part is very fast and performs all of the same
>> >> > functions that my AFL version does. The resulting trade list is
>> >> the
>> >> > same.
>> >> >
>> >> > Because the data loaded into video memory, it can be resused 
> for
>> >> many
>> >> > passes over the data with different optimization values. So,
>> >> > hundreds of combinations of optimization values can be tried 
> per
>> >> > second.
>> >> >
>> >> > For non optimization runs, the Dll just loads one symbol into
>> > video
>> >> > memory and processes it. Counting the overhead of moving data 
> to
>> >> the
>> >> > video card and extracting the trade list for a single symbol, 
> the
>> >> > result is similar to AFL code alone. This lets me test the code
>> >> and
>> >> > make sure it is correct.
>> >> >
>> >> > This approach works best when the data only needs to be loaded
>> >> once,
>> >> > then "resused" many times. It also works best when there is a
>> > lot
>> >> of
>> >> > data to work with.
>> >> >
>> >> > What is more interesting to me and what would be more useful 
> for
>> >> > others would be a general drive that requires no Dll changes to
>> >> > modify the system. The performance would not be as good as hand
>> >> > optimized code, but would still be much better than AFL code
>> >> alone.
>> >> > It would take trading system design to a whole new level. It
>> > would
>> >> > provide enough performance to make working with Intra day data 
> as
>> >> > easy as daily data is today.
>> >> >
>> >> > Writing such a driver would be hard, but I have already done 
> some
>> >> > prototypes and design work. I am tempted to do it for my own
>> > use.
>> >> > If I made it available to others supporting it would be a PITA.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --- In [EMAIL PROTECTED] <mailto:amibroker%
> 40yahoogroups.com> ps.com,
>> "Paul Ho" <paul.tsho@> wrote:
>> >> > >
>> >> > > I'm very interested
>> >> > > could you elaborate a bit more
>> >> > > What model of Nvidia chipset are you using, and with how much
>> >> > memory?
>> >> > > Not sure exactly what you mean when you say
>> >> > > It uses AmiBroker to load the symbol data and perform
>> >> calculations
>> >> > > that do not depend on the optimization parameters. Once 
> loaded
>> >> into
>> >> > > video memory, repeated passes can be made with different
>> >> > parameters,
>> >> > > avoiding any overhead.
>> >> > > Can you give me some examples. I presume when your dll is
>> > called.
>> >> > AB passes
>> >> > > one or more arrays of data belonging to 1 symbol, is that 
> true?
>> >> > > Not sure exactly what the rest mean either. How many 
> functions
>> >> are
>> >> > you
>> >> > > running in your dll, and what does each of the do?
>> >> > > Great of you to share your insight.
>> >> > > Cheers
>> >> > > Paul.
>> >> > >
>> >> > >
>> >> > >
>> >> > > _____
>> >> > >
>> >> > > From: [EMAIL PROTECTED] <mailto:amibroker%
> 40yahoogroups.com> ps.com
>> >> [mailto:[EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> 
> ps.com]
>> >> > On Behalf
>> >> > > Of dloyer123
>> >> > > Sent: Tuesday, 5 August 2008 9:19 AM
>> >> > > To: [EMAIL PROTECTED] <mailto:amibroker%40yahoogroups.com> 
> ps.com
>> >> > > Subject: [amibroker] Freakishly fast backtest using 64 cores
>> >> > >
>> >> > >
>> >> > >
>> >> > > Greetings,
>> >> > >
>> >> > > I ported part of my AFL backtest code to a plugin, that takes
>> >> > > advantage of the graphics math cores on the video card that 
> are
>> >> > > normally used for 3d graphics.
>> >> > >
>> >> > > I was able to get a several thousand fold performance
>> > improvement
>> >> > > over AFL code alone.
>> >> > >
>> >> > > My goal was to reduce the 25 seconds AFL code alone uses for 
> a
>> >> > single
>> >> > > portfolio level back test to less than 1 second, allowing 
> multi
>> >> day
>> >> > > optimization and walkforward runs to complete in a more
>> >> reasonable
>> >> > > time, and also just to see how fast I could get it to run.
>> >> > >
>> >> > > The backtest runs over 1 year of 5 minute bars for about 1000
>> >> > > symbols. 1 year of data normally takes 25 seconds for 
> AmiBroker
>> >> > > alone, or 18 seconds for 6 months of data. A typical
>> > optimization
>> >> > > run takes hundreds of these passes per walk forward step,
>> > taking
>> >> > > hours.
>> >> > >
>> >> > > Using the Nvidia CUDA API, running on my mid range video 
> card.
>> > It
>> >> > > was much faster. Much, much, much faster. How fast?
>> >> > >
>> >> > > It reduced the run time from 25s to... 4.4ms. That is more 
> than
>> >> > > 200/s!
>> >> > >
>> >> > > I didnt believe the timing when I saw it at first. So, I put
>> >> 1,000
>> >> > > runs in a loop and sure enough, it ran 1,000 iterations in
>> > about
>> >> 4
>> >> > > 1/2 seconds. This far exceeded my gaol or expectations.
>> >> > >
>> >> > > The resulting trade list matches that obtained by the AFL
>> > version
>> >> > of
>> >> > > this code.
>> >> > >
>> >> > > I estimate that it is processing 32GB of bar data/sec.
>> >> > >
>> >> > > Getting this to work at peak performance was tricky. Most of
>> > what
>> >> I
>> >> > > have learned about code optimization does not apply.
>> >> > >
>> >> > > It uses AmiBroker to load the symbol data and perform
>> >> calculations
>> >> > > that do not depend on the optimization parameters. Once 
> loaded
>> >> into
>> >> > > video memory, repeated passes can be made with different
>> >> > parameters,
>> >> > > avoiding any overhead.
>> >> > >
>> >> > > For non backtest/optimization runs, the code just evaluates 
> one
>> >> > > symbol and passes the data back to AmiBroker
>> > buy/sell/short/cover
>> >> > > arrays, making it easy to test, validate and visualize the
>> >> trades.
>> >> > > There is very little performance gain in this case.
>> >> > >
>> >> > > There are problems, however. To run optimizations at peak
>> > speed,
>> >> I
>> >> > > can not use AmiBroker to calculate the optimization goal
>> >> function.
>> >> > > So, I am in the process of writing code to match signals and
>> >> > > calculate the portfolio fitness function. Once I do this, I
>> > will
>> >> be
>> >> > > able to perform full optimizations and walk forwards at 3
>> > orders
>> >> of
>> >> > > magnitude faster than is possible with AmiBroker alone.
>> >> > >
>> >> > > Also, this is not general purpose code. Changing the system
>> > code
>> >> > > means changing a dll written in C. However, there is no 
> reason
>> >> that
>> >> > > this could not be made more general.
>> >> > >
>> >> > > I have made some prototypes of "Cuda" versions of basic AFL
>> >> > > functions. The idea is to queue the function calls into a
>> >> > definition
>> >> > > executed by a micro kernel running on the graphics cores. The
>> >> > result
>> >> > > would be the ability to use the full power of the graphics
>> > cores
>> >> by
>> >> > > modifying AFL code to use Cuda aware versions with no changes
>> > to
>> >> C
>> >> > > code. It would be an interesting, but big project.
>> >> > >
>> >> >
>> >>
>> >
>> >
>> >
>> > ------------------------------------
>> >
>> > Please note that this group is for discussion between users only.
>> >
>> > To get support from AmiBroker please send an e-mail directly to
>> > SUPPORT {at} amibroker.com
>> >
>> > For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
>> > http://www.amibroke <http://www.amibroker.com/devlog/> 
> r.com/devlog/
>> >
>> > For other support material please check also:
>> > http://www.amibroke <http://www.amibroker.com/support.html>
>> r.com/support.html
>> > Yahoo! Groups Links
>> >
>> >
>> >
>>
> 
> 
> 
> ------------------------------------
> 
> Please note that this group is for discussion between users only.
> 
> To get support from AmiBroker please send an e-mail directly to 
> SUPPORT {at} amibroker.com
> 
> For NEW RELEASE ANNOUNCEMENTS and other news always check DEVLOG:
> http://www.amibroker.com/devlog/
> 
> For other support material please check also:
> http://www.amibroker.com/support.html
> Yahoo! Groups Links
> 
> 
> 

Reply via email to