On 02/09/2011 05:09 PM, Mark Knecht wrote:
On Wed, Jan 5, 2011 at 9:21 AM, David St John<[email protected]> wrote:
<SNIP>
x1 x2 x3
x1 1.0000000 0.7682398 0.6788077
x2 0.7682398 1.0000000 0.6088950
x3 0.6788077 0.6088950 1.0000000
As you can see, even if you were just randomly buying and selling these
contracts at odd times, but most of the time had no position (68% of the
time in my example), then the systems will still be very strongly
correlated. Since you are trying to find comparatively lower correlation
between market/system pairs, I think this would be as fine a measure as
any. Just look for the smallest entries in your correlation matrix.
Hope this helps,
-David
Hi all,
Back from oblivion. I finally got some time over the last few days
to try and put some shape to this problem. Most of this is background.
Skip to the bottom to get the question about R.
REVUE: I have saved historical trade results from a large set of
systems. (In this case 30) Every file used as input to this R program
comes from a completely different trading system operating on the same
market. (In this case about 15 months trading the Russell futures
market - TF.D in TradeStation) I have aggregated the data into
results/day and placed them in a large array. Looking at just the
first 10 systems for the first 20 days the results look like this:
(unlikely to survive email very well...)
MyResults[1:20,1:11]
TradingDate PL1 PL2 PL3 PL4 PL5 PL6 PL7 PL8 PL9 PL10
1 2009-06-01 0 0 0 0 0 0 0 0 0 0
2 2009-06-02 0 0 0 0 0 0 0 0 0 0
3 2009-06-03 0 0 0 0 0 0 0 0 0 0
4 2009-06-04 0 341 0 0 -89 -609 0 0 1001 0
5 2009-06-05 0 -569 -569 -299 333 161 0 151 0 -109
6 2009-06-08 0 12 -189 -189 471 -189 0 251 0 0
7 2009-06-09 0 81 -149 -79 0 -159 -379 0 0 0
8 2009-06-10 91 -799 -119 -999 0 11 0 0 401 0
9 2009-06-11 0 -249 191 0 271 -289 -979 571 -49 -639
10 2009-06-12 0 391 -449 0 0 391 0 0 741 0
11 2009-06-15 582 351 591 343 0 581 0 0 291 351
12 2009-06-16 0 -109 0 -1137 612 941 0 1201 1291 0
13 2009-06-17 0 171 151 -329 401 -339 0 451 581 0
14 2009-06-18 0 531 -169 -19 0 552 -109 0 0 0
15 2009-06-19 0 -429 -429 -309 352 271 0 -309 401 -9
16 2009-06-22 592 1342 621 504 0 1222 0 0 0 301
17 2009-06-23 0 0 0 564 0 -429 0 0 0 0
18 2009-06-24 61 -379 -59 0 401 81 0 401 0 0
19 2009-06-25 0 1811 0 0 -1029 1782 801 381 -559 261
20 2009-06-26 0 71 71 71 241 0 241 441 -9 91
Here's an idea of the system results from a returns point of view:
colSums(CorData)
PL1 PL2 PL3 PL4 PL5 PL6 PL7 PL8 PL9 PL10 PL11
PL12 PL13 PL14 PL15 PL16 PL17 PL18 PL19 PL20 PL21 PL22 PL23
PL24 PL25
24065 35073 36471 18273 19257 30939 19296 24073 27412 20512 29211
28338 29093 23487 36062 18125 21926 34095 30921 32945 30546 18295
30707 27885 15864
PL26 PL27 PL28 PL29 PL30
23497 34502 31098 28028 30616
mean(colSums(CorData))
[1] 27020
sd(colSums(CorData))
[1] 6054
(Not sure mean and sd make sense for this, but that's about the limit
to my understanding about statistics...) ;-)
OK, so here's the R question. Let's say I'm willing to trade 5 systems
at one time and I have to pick 5. How do I pick?
1) Find the 5 with the largest equity. Too easy. Don't need R to do that.
2) Find some sort of 'optimized' choice...
Looking at the correlation of just the first 10 systems I get this:
CorResults[1:10,1:10]
PL1 PL2 PL3 PL4 PL5 PL6 PL7 PL8 PL9
PL10
PL1 1.0000 0.199 -0.030 -0.1534 -0.0433 -0.047 -0.0017 -0.0542 0.052 0.0079
PL2 0.1991 1.000 0.090 0.1306 -0.0398 0.029 0.4504 -0.1713 -0.310 0.5159
PL3 -0.0299 0.090 1.000 0.1848 0.2170 0.157 0.1718 0.3185 0.083 0.4330
PL4 -0.1534 0.131 0.185 1.0000 0.0063 0.170 0.2640 0.0639 -0.178 0.2653
PL5 -0.0433 -0.040 0.217 0.0063 1.0000 0.075 -0.0168 0.1581 0.116 0.1855
PL6 -0.0470 0.029 0.157 0.1700 0.0751 1.000 0.0145 0.1401 0.011 0.0912
PL7 -0.0017 0.450 0.172 0.2640 -0.0168 0.014 1.0000 -0.0666 -0.207 0.5081
PL8 -0.0542 -0.171 0.319 0.0639 0.1581 0.140 -0.0666 1.0000 0.220 0.0098
PL9 0.0520 -0.310 0.083 -0.1781 0.1157 0.011 -0.2071 0.2195 1.000 -0.2296
PL10 0.0079 0.516 0.433 0.2653 0.1855 0.091 0.5081 0.0098 -0.230 1.0000
I'd like to take this to the next level where I pick a group of 5 that
has the lowest overall correlation, and a second group of 5 that has
the highest overall correlation. From that I'll calculate an aggregate
equity curve and then look at things like ROA on the totals as a way
to evaluate how I feel about the groups.
QUESTION:
How can I find the 5 systems in the correlation matrix that when
summed together again all possible combinations of the 5 gives me the
largest or smallest value?
I would ask a different question.
I assume that the P&L distribution of your strategies is not normally
distributed, so minimizing covariances would also minimize positive
skew, which is likely not the outcome that you want.
Here's a simplified/modified version of a process I've used in the past
to choose among 'similar' systems (in this case similar in all operating
on the same market).
1. first eliminate the ones with 'bad' performance based on criteria you
set (drawdowns, worst day, percent winning/losing days, etc.)
2. take your two 'best' performers. this may be just P&L or may be in
combination with other statistics.
3. pick the other 3 by looking for low/inverse correlations to your two
best, or alternately just pick the five best
4. run a portfolio optimization to decide 'how much' to allocate to each
system. I use an objective like 'maximize return subject to minimizing
drawdowns and minimizing risk concentration (component ES) while keeping
my total drawdown and 1-day (95%, 1 in 20, about 1 day per month) ES
below a specific threshold.
5. trade your five systems using those portfolio weights converted
backwards to maximum position sizes.
If you have the capital, I'd skip steps 2 & 3 entirely, and let the
optimizer decide 'how much'.
Regards,
- Brian
--
Brian G. Peterson
http://braverock.com/brian/
Ph: 773-459-4973
IM: bgpbraverock
_______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-sig-finance
-- Subscriber-posting only. If you want to post, subscribe first.
-- Also note that this is not the r-help list where general R questions should
go.