Please bear with the somewhat unusual subject matter. I am not sure whether or not this is an (a) interesting or (b) tractable graph theoretical problem, and beg your collective indulgence!
The Six-Up (I believe known as Six-Pick in the USA), and henceforth abbreviated to 6UP is a type of horse racing wager where one tries to pick the first or second horse in each of six races. For a given race meeting, we might have a large number (80,000 would not be uncommon) of computer betting model-generated single 6UP bets which each look like this: R1 / R2 / R3 / R4 / R5 / R6 (amt), where Rn is the runner one has picked for race leg n and amt is amount wagered for this bet. As each leg is a separate race, n is independent: i.e. it is quite possible to have a bet like 1 / 1 / 1 / 1 / 1 / 1 (10). Generally, all legs of the 6UP would have 14 horses running, giving 7,529,536 possible combinations to wager on. So it's not at all uncommon to have a 'mere' 80,000 combinations showing a positive expectation and therefore being desirable wagers. It *is* possible to send bets electronically to the totalizator system. However, it is not feasible to place 80,000+ bets in the available time. As is the case with all totalizator systems, there is a way to group single bets so that (traditionally) one paper betting ticket could represent multiple bets for the same wager amount. This can also be done for electronic bet submission. For example, the single bets: 4 / 7 / 12 / 1 / 1 / 9 (10) 4 / 8 / 12 / 1 / 1 / 9 (10) 4 / 7 / 12 / 1 / 1 / 14 (10) 4 / 8 / 12 / 1 / 1 / 14 (10) could be grouped together as one bet: 4 / 7 + 8 / 12 / 1 / 1 / 9 + 14 (10) (I have listed these bets in order such that only one leg value changes between each line. Of this, more later.) This grouped bet would then be submitted to the totalizator system where it would be expanded back into its constituent bets and the 4 single wagers recorded against our account. Ideally one would hope to merge many more single bets into one grouped bet and see multiple terms in each of the 6 legs of the grouped bet - thereby achieving a much greater 'compression ratio'. GOAL: We wish to group or merge (depending on one's terminological preference) large numbers of these single bets quickly and efficiently. (For the purposes of most of this outline, I am pretending that all 6UP single bets are for the same wager amount. In practice, this would not be so. In reality wager amount does matter because I would not wish to group these bets together: 4 / 7 / 12 / 1 / 1 / 9 (100) 4 / 8 / 12 / 1 / 1 / 9 (10) 4 / 7 / 12 / 1 / 1 / 14 (10) 4 / 8 / 12 / 1 / 1 / 14 (10) 4 / 7 + 8 / 12 / 1 / 1 / 9 + 14 (10) would miss $90 desired investment on 4 / 7 / 12 / 1 / 1 / 9 and more seriously 4 / 7 + 8 / 12 / 1 / 1 / 9 + 14 (100) would over-invest 90 on the other three bets in the group where we only want to bet $10.) After floundering around with various naive approaches and flipping through Skiena's Algorithm Design Manual and some discussion group postings, it was suggested to me that I compute modified Hamming distances between each bet in the set of merge candidate single bets, where the modified Hamming distance is the number of legs in which the runners differ between two bets, e.g.: 1 / 1 / 1 / 1 / 1 / 1 1 / 1/ 14 / 1 / 1 / 1 --> modified Hamming distance between these two bets is 1 because they differ only in leg 3. 1 / 1 / 1 / 1 / 1 / 1 2 / 1 / 1 / 2 / 2 / 7 --> modified Hamming distance between these two bets is 4 because they differ in legs 1, 4, 5, and 6. If I construct an undirected graph where vertices are bets and edges join only those pairs of vertices having a modified Hamming distance of 1, then I have some kind of representation of potential grouping candidates. Furthermore, I have been able to discern some simple properties of such a graph: Cliques in this graph describe the simplest form of grouping having multiple terms in only one leg, such as: 1 / 14 / 9 / 4 / 11 / 1 + 8 + 9 + 10 i.e. if n vertices are connected to each other by edges, then they must have modified Hamming distance of 1, then must all differ from each other by one runner in one leg position. Further reflection suggests that simple cycles in this graph describe groupings having multiple terms in two legs, such as 4 / 7 + 8 / 1 / 1 / 9 + 14 A 4 / 7 / 12 / 1 / 1 / 9 (10) B 4 / 8 / 12 / 1 / 1 / 9 (10) C 4 / 8 / 12 / 1 / 1 / 14 (10) D 4 / 7 / 12 / 1 / 1 / 14 (10) (ABCD is a simple cycle as only 1 change between each line) These are a good start, but it's necessary to to merge further. If I could find (e.g.) 2 / 14 / 9 / 4 / 11 / 1 + 8 + 9 + 10 elsewhere in my graph, I would then be able to combine this with the aforementioned 1 / 14 / 9 / 4 / 11 / 1 + 8 + 9 + 10 and get: 1 + 2 / 14 / 9 / 4 / 11 / 1 + 8 + 9 + 10 and so on. My unschooled (I have an EE, not CS background) flounderings in the literature suggest that in effect I should be searching my graph for matchings of cycles and cliques. I leave out any notion of optimality for the time-being, given the NP-hard odor about the whole undertaking. I would very greatly appreciate: (1) Any brief comments anybody might be able to make about the soundness of my approach. I would be grateful to know that I am not barking up the wrong tree and missing something obvious. (2) In addition to (1) any pointers to papers which might refer to a similar problem. I have searched high and low but have not been able to find a direct analogue to this bet grouping problem. Despite initially looking to string matching and various clustering and bioinformatics topics, I wasn't able to find one. Did I miss something? (3) I am very new to graph theory and do not (this is probably obvious to the assembled company by now) have a strong background in mathematics. If I am missing something due to lacking the basic conceptual grammar, incorrect use of terminologies, etc., I would appreciate any tips. (4) If nobody can formulate the problem in a better way, how should I go about attempting to match cliques and cycles? I get the impression that this is not such a simple thing to do. Finally, to give some notion of the size of networks I am looking at, one fairly maximal (I hope) edge case had 143,000 vertices and 2.5 million edges when I ignored wager size. In practice, I would probably only create edges between two (modified Hamming distance = 1) vertices when their wager sizes were within an acceptable (say +/- 10%) range of each other. Adding this constraint results in a graph containing multiple networks, each unconnected with the others. This would tend to reduce the aggregate number of edges quite significantly. Presumably I would then attack each of these self-contained networks and attempt to extract fairly optimal bet groupings from them - issues of wager size having been dealt with at one fell swoop by virtue of fact that each network only contains wagers of acceptably similar size. --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups-beta.google.com/group/algogeeks -~----------~----~----~----~------~----~------~--~---