Hi,
I'm stuck on getting decent performance for the following problem.
Say we have a list of all 5-card hands from a deck of 52 cards:
load '~addons/stats/base/combinatorial.ijs'
$allHands=. 5 comb 52
2598960 5
We also have a list of all 4-card combinations:
$HCs=. 4 comb 52
270725 4
I want to find all hands possible using any two cards from a given row of *HCs
*but excluding those with either of the other two cards from the same row.
For example, for this randomly-chosen row of *HCs*, which hands include the
first two but exclude the second two?
(]{~[:?#) HCs
15 22 33 44
(#,+/) include=. 2=+/"1 allHands e."1 ] *15 22*
2598960 19600
(#,+/) exclude=. 0=+/"1 allHands e."1 ] *33 44*
2598960 480200
So there are 17,296 hands we want:
+/include*.exclude
17296
Check five at random to verify that these look correct:
]rcix=. (]{~[:5&?#) I. include*.exclude
1002358 1165504 2176134 1960355 56685
rcix{allHands
4 15 22 37 39
5 15 22 34 45
15 18 20 22 39
12 15 22 27 39
0 4 6 15 22
These look good because all include (15,22) but exclude any of (33,44).
However, this is only one possibility for this single row of *HCs* but we
have six variations for this row based on choosing all possible pairs of
indexes into the row of *HCs* to include,
|:inclIx=. 2 comb 4
0 0 0 1 1 2
1 2 3 2 3 3
And six corresponding pairs to exclude (indexes into the row of *HCs*):
|:exclIx=. (i.4)-."1 ] 2 comb 4
2 1 1 0 0 0
3 3 2 3 2 1
In the above example, we did the one of the six corresponding to *inclIx *of
(0,1) and *exclIx *of (2,3). Is there a way to do all six sets for each
row of *HCs* compared to each row of *allHands*?
We might start by reducing the sizes of our arrays by making them character
instead of integer:
allHands=. allHands{a. [ HCs=. HCs{a.
However, we quickly see that we run out of memory for the expression we
would like to write:
include=. 2=+/"1 allHands e."1 / inclIx{"1 / HCs
|out of memory
| include=.2=+/"1 allHands e."1/inclIx{"1/HCs
A little arithmetic shows us the extent of the problem:
(],*/) #&>(,/inclIx{"1/HCs);<allHands
1624350 2598960 4221620676000
10^.1624350*2598960
12.6255
So we have a 4-trillion-element intermediate result which is a bit much.
Our immediate thought might be to do this in pieces by selecting successive
portions of the right-hand part (*HCs*) of this expression. We would like
to work on integral pieces for neatness, so we look at the prime factors of
the size of *HCs* and select the product of a couple of them to be the
"chunk size":
q:#HCs
5 5 7 7 13 17
ixs=. (ctr*chsz)+i.chsz [ ctr=. 0 [ chsz=. */5 5
[The expression for the next chunk would be
ixs=. (ctr*chsz)+i.chsz [ ctr=. >:ctr
and so on.]
Timing how long one piece takes and scoring the results by simply adding up
their indexes into *allHands*, we get this:
6!:2 'sel=. (2=+/"1 allHands e."1 / inclIx{"1 / ixs{HCs) *. 0=+/"1
allHands e."1 / exclIx{"1 / ixs{HCs'
11.5105
scores=. 0$~#HCs NB. Initialize scores vector
6!:2 'theseScores=. +/&>I.&.><"1 |:+./"1 ] 0 2 1|:sel'
0.687056
6!:2 'scores=. (theseScores) ixs}scores'
6.4e_6
+/11.5105 0.687056 6.4e_6 NB. Total time per chunk
12.1976
12.1976*(#ixs)%~#HCs NB. Total estimated time
132088
0 60 60#:12.1976*(#ixs)%~#HCs NB. Estimated time: h m s
36 41 27.8104
So we should be able to do it this way in about 36 hours. Can someone
think of a faster method to accomplish this? The memory requirements seem
modest enough that I should be able to run five to ten processes
simultaneously on this to reduce the overall time but is there a way to
speed up the basic selection?
Thanks,
Devon
--
Devon McCormick, CFA
Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm