On Wed, Jan 5, 2011 at 9:21 AM, David St John <[email protected]> wrote: > Mark, > > I would suggest thinking about correlation between the returns. For > example, using daily data for SPY, DJI, QQQQ, I see the correlation in the > returns using the following calls: > >> library(quantmod) > >> getSymbols(c('SPY','^DJI','QQQQ')) > > [1] "SPY" "DJI" "QQQQ" > >> x1 <- log(as.vector(as.matrix(SPY)[,4])/(as.vector(as.matrix(SPY)[,1]))) > >> x2 <- log(as.vector(as.matrix(QQQQ)[,4])/(as.vector(as.matrix(QQQQ)[,1]))) > > >> x3 <- log(as.vector(as.matrix(DJI)[,4])/(as.vector(as.matrix(DJI)[,1]))) > >> data <- cbind(x1,x2,x3) > >> cor(data) > > x1 x2 x3 > > x1 1.0000000 0.9067068 0.8284568 > > x2 0.9067068 1.0000000 0.7556838 > > x3 0.8284568 0.7556838 1.0000000 > > In this example I just used log(close/open) as the one-period return. > > So, if you want to see if your systems are correlated, I would suggest > defining the systems' return series by multiplying the one period returns > with the position indicated by the system. > > I generated a random 'system' as a sequence of -1, 0, and 1 values as an > example: > >> z1 <- rnorm(length(x1)) > >> z2 <- rnorm(length(x2)) > >> z3 <- rnorm(length(x3)) > >> s1 <- ifelse(z1>1,1,0)-ifelse(z1<-1,1,0) > >> s2 <- ifelse(z2>1,1,0)-ifelse(z2<-1,1,0) > >> s3 <- ifelse(z3>1,1,0)-ifelse(z3<-1,1,0) > >> signal <- cbind(s1,s2,s3) > > > cor(data*signal) > > x1 x2 x3 > > x1 1.0000000 0.7682398 0.6788077 > > x2 0.7682398 1.0000000 0.6088950 > > x3 0.6788077 0.6088950 1.0000000 > > As you can see, even if you were just randomly buying and selling these > contracts at odd times, but most of the time had no position (68% of the > time in my example), then the systems will still be very strongly > correlated. Since you are trying to find comparatively lower correlation > between market/system pairs, I think this would be as fine a measure as > any. Just look for the smallest entries in your correlation matrix. > > > Hope this helps, > -David
David, Thanks. It helps a lot. I like it because it's simple, the amount of data being correlated isn't huge (daily returns for a few years, in sample, out of sample, etc.) and could easily scale to hourly or some other time scale fairly easily. I'll likely set up my data to take a cut at this method in the next day or two and post back some results. Thanks. The one thing I don't have yet is how to pick the 'best' 5 out of 50, where 'best' implies something I'm interested in. (Profit, low drawdowns, etc.) I can write a fitness function for what interests me. The next step is finding a good solution for maximizing what I consider 'best'. I currently have maybe a hundred trading systems to poke around with in this manner. I'd like to find 5 that work well together, doing different things to earn their keep, one making money when another isn't. There is a thread on the main R-users list right now called "Cost-benefit/value for money analysis" that might be appropriate if I let the value be how these respond to a custom fitness function and limit the number chosen to 5. Cheers, Mark _______________________________________________ [email protected] mailing list https://stat.ethz.ch/mailman/listinfo/r-sig-finance -- Subscriber-posting only. If you want to post, subscribe first. -- Also note that this is not the r-help list where general R questions should go.
