Several recent discussion -- on the lists and privat -- have brought up the question of how to best parametrize tables. I'd like to provide our current approach here and would appreciate a discussion.
+++ Constraints +++ We want to store a code book with quintillions of entries in a compressed form while not exceeding several constraints: A) Computational power -- At most Trillions of computations for each decode. B) Storage -- At most Trillions of data words of storage. C) Disk Seeks -- At most Tens of Million seeks for each decode. Besides these physical limits, tables are limited by D) Mergers -- Collisions are unavoidable but only cause damage when they appear under the same function +++ Optimizations +++ Two optimization techniques exist to remove the major bottle-necks of plain TMTO tables: 1) 'Distinguished Points' decrease C). DP tables still have too many mergers which bounds the maximum size of each table. Smaller tables are undesirable as splitting the data into 'n' different table increases the need for A) by a factor of 'n'. 2) 'Rainbow Tables' decrease D) and A) up to some point because tables can be larger. After that point adding more 'colors' to the rainbow increase A). Combining 1) and 2) is optimal for our application. So far we used 32 'rainbow colors' each ending in 15 bit distinguished point. We might want to change that after the recent discoveries to 12 or 13 bits. Here is why: +++ Parameters +++ Full calculations are in this Excel sheet: http://reflextor.com/Rainbow.Tables.xls Assume we need to cover 1% of the key space of 2^61.26 (after the recent break-through). Number of tables * tables height * colors per row * points per color = 1% * 2^61.26 or: n * h = 2^54.62 These values should be stored compressed in 2 TB. That means: Number of tables * tables height = 2 TB or: n * h * k * 2^l = 2^37 => k * l = 2^17.62 We want 'n' to be as small as possible but are bound by mergers (which occur more frequently as the tables grow). The point at which c% of new chains are thrown away is at: (1 - h*l^2 / Z)^k = (1 - c) => h*l^2 = Z (1 - (1-c)^(1/k) ) Where 'Z' is the collision probability of A5/1, which I conservatively assume to be (1/2)*2^-61.26. Setting 'c' at 50%, we get solution triples of <h,l,n> for each number of colors 'k' we chose. These differ in requirements GPU and HDD requirements. Computations: T= n * k(k+1)/2 * l (for each try, hundreds of which will be needed) HDD seeks: H = n * k (for each try) +++ Results +++ k= 8; l = 2^14.62; h= 2^27.43 n = 2^9.57 T = 2^37.70; H = 20.91 k= 32; l = 2^12.62; h= 2^29.48; n = 2^7.52 T = 2^37.25; H = 20.86 The difference between these choices is small, so the exact number of 'colors' is not crucial. The speed-up over not using rainbow colors is 3 in this case and much larger when using a larger key space (as before) or less storage. Play with these values in the Excel linked above. For example, increase 'C' to simulate the effect of spending more computational time upfront. Looking forward to the discussion! _______________________________________________ A51 mailing list [email protected] http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51
