Re: [A51] Optimizating chain generation.

M vd S Mon, 11 Jan 2010 18:02:20 -0800

Hi,

While filling the wiki, I did an attempt to provide some more statistics about mergers, from the state-space point of view.

Especially if we are to restrict ourselves to valid states, we enter a subset of keyspace in which I believe mergers are more likely. Maybe a reconsideration of the number of colours in the rainbow is needed. I don't know the exact considerations leading to the current 32 colours, but since in cracking, hits are to be 6-13 times effective, an equivalent multiplication of colours (and thus lookups) can't be too big of a problem.

It was quite a headache to understand myself, but please give it a try:

http://reflextor.com/trac/a51/wiki/KeystreamSpace

The main point is that there are groups of states which all lead to the same keystream - and thus create a merger. Such groups get bigger on average when you restrict yourself to valid states. (or start with clocking forward 100 times)

Maybe mergers result from totally different states, which happen to generate the same keystream, as well. I wouldn't know. (it would be interesting to get this information out of the table generation process)

The chain merging data is interesting. I see from the data below that the percentage of chains merged starts at 5% and then decreases linearly with the number of start values. The interesting thing is this:
(vertical axis: % of chains lost, horizontal axis: chains left)
screenshot of graph

I clearly see two lines. It appears that round functions 10, 14, 17, 21, 24, 28 and 31 (+/- 1) generate 0.5% less chain mergers. Do we have any clue why? Do they share a common bit value, or have a certain number of bits? Could you reconstruct the xor patterns for the 32 rounds, Sacha?

(equations are:
y = 4.5613 * 10-9 x + 0.1163
y = 3.937 * 10-9 x + 0.0581)

And if you could find the time to do it, I would really like to see the same table but then for the 100-forward case. I hope to relate some real-world data to the statistics derived above.

Regards,
M.

On 1/12/10 12:58 AM, sascha wrote:

For a 32 rounds table with 1024*1024*1024 start values:
1  1073741823
2  1023438982
3  976087268
4  932768763
5  893795050
6  857258423
7  824594686
8  793842630
9  764984529
10 742264968
11 717542086
12 693794500
13 671348674
14 653519282
15 633911863
16 615705741
17 600811372
18 584022929
19 567966279
20 553038610
21 541126688
22 527714176
23 514813648
24 504121909
25 492442040
26 480922764
27 469931269
28 461146395
29 451036696
30 441442433
31 433670029
32 ?
33 ?


sum: 20.5 Grounds
(instead of 30 Grounds)

first column is the round number starting at 1, the second is
the number of chains *before* that round.
After the last round the chains were written unsorted (and no chain count printed there),
then the final sorter purged some more (numbers lost), you have to interpolate.
I could also do a 270M table in between to find its values.

I also have stats for 8 rounds table, 1Gchains at the start
1 1073741824
2 1022078072
3 975558741
4 932155936
5 893248017
6 856191164
7 822026508
8 791037697
9 ?

sum 7.3 Grounds instead of 8

On Mon, Jan 11, 2010 at 10:48:26PM +0100, Frank A. Stevenson wrote:

Some news, and a question to consider.

I managed to speed up the ATI chain generation by somewhere close to
30%. Currently a single 5850 card can calculate ~650 chains pr second,
meaning my dual card setup can complete a table in less than 2.5 days.

I did this by unrolling my loop a little, and rather than shifting the
output, I use indexed writes to cached memory. Since ALU clauses are run
in parallel with memory fetches in the GPU threading engine this is
almost a pure gain.

I still have some more tricks up my sleeves, but first a question for
Karsten or whoever would like to do the maths:

I am thinking about "merge free table generations", and the procedure
goes like this:

Start with 270M points, and calculate the first round only and write to
disk. Then read that output, and bucket sort the DP1s, eliminating any
merges. For non merges, calculate the second round and write to disk.
Repeat this for every 32 rounds, keeping fewer and fewer chains, and you
will have produced a table containing only merges from the 32nd round.

Clearly this is faster, as disk access is much quicker than calculating
the rounds, but the real question is how much work can you eliminate
this way ? What speedup will you get ?

f




_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Re: [A51] Optimizating chain generation.

Reply via email to