Re: [A51] 5.6x efficiency increase by removing illegal states

Karsten Nohl Tue, 12 Jan 2010 11:53:39 -0800

This is one of the most effective optimizations we found so far. Thank  
you, M.


I'm reading the data as: Clocking the cipher in each chain link by an  
additional 100 steps will decrease the state space from to 2^61.36.  
Given the available trade-offs, this will either make the attack 39  
times faster (=2^(2*2.64)), the tables 6.23 times smaller (=2^2.64),  
or a combination between the two (see trade-offs below).

When choosing performance over size, the pre-computation time roughly  
doubles due the additional 100 clocks; when choosing smaller tables,  
the pre-computation time stays about the same but produces less data.

I would prefer to generate faster tables instead of smaller ones,  
still shooting for 1TB in tables. All other table parameters are  
affected by these new choices, of course. At 32 colors, we would need  
only 88 fairly large tables. Further increasing the number of colors  
decreases the number of tables and hence the ability to distribute the  
work while only improving the attack time by <10%.

I think there is no need to check-and-remove any illegal state if we  
can avoid generating many in the first palce. Let's verify that the  
85% decrease in state size can really be gained and if that checks,  
then start a second set of tables. This would be another major  
improvement over the literature!

++ The trade-offs are: ++

Let 2^M be the state space, 2^D be the available storage (in # of  
chains), and 1/2^C be the coverage, then the attack time 2^T (in # of  
computations) is:

Pure Distinguished Point Tables
T = 2M - 3C - 2D + 1
Read as:

Pure Rainbow Tables
T = 2M - 2C - 2D - 2

++++++++

On Jan 11, 2010, at 5:06 PM, M vd S wrote:

> On 1/11/10 5:09 AM, sascha wrote:
>> On Mon, Jan 11, 2010 at 12:11:53AM +0100, M vd S wrote:
>>
>>> On 1/10/10 8:41 PM, sascha wrote:
>>>
>>>> On Sun, Jan 10, 2010 at 05:15:46PM +0100, M vd S wrote:
>>>>
>>>>> Apart from the optimization, we can also observe that the state
>>>>> space converges to 2^64 * (1-84%) before generating the keystream.
>>>>> This must also mean that only 16% of 2^64 possible 64 bit  
>>>>> keystream
>>>>> can exist. I wouldn't know how this can be of value to us right  
>>>>> now,
>>>>> but it is worth noting.
>>>>>
>>>> well it is of great value: our tables need only be 16% the size  
>>>> they were
>>>> before. at the cost of 2-3 times more computational effort.
>>>>
>>>>
>>> I aimed at the fact that we will not see all possible 64 bit
>>> keystreams in practice. So there's some redundancy hidden in there.
>>>
>>> Looking at the last 64 bits of keystream used to cipher the first
>>> 114 bits, one in ten combinations don't exist. That's a weakness of
>>>
>>
>> you meant 9 in 10!?
>>
>>
> yes correct!
>
>>> the cipher in itself, as if we would know that bits A B and C are
>>> always one. Which - if it were that simple - would allow us to say
>>> something about the contents of the burst.
>>>
>>
>> the strength of the cipher is reduced by a factor of 6, now you have
>> a 2^61.5 state cipher, thats still pretty complex.
>>
>>
> factor 6 to 16, depending on which part of the 2*114 bit keystream you
> look at.
>
> But like I said, I don't see any clear way to generate / detect  
> illegal
> keystreams. Maybe someone here has an idea.
>
> ..
>>> Anyway. The table now looks like:
>>> N    S    cpu cost    table eff    total eff
>>> 0    84%    1.00    1.00    1.00
>>> 1    75%    1.02    1.56    1.54
>>> 2    72%    1.03    1.75    1.70
>>> 4    68%    1.06    2.00    1.88
>>> 8    63%    1.13    2.31    2.06
>>> 16    53%    1.25    2.94    2.35
>>> 32    37%    1.47    3.94    2.68
>>> 64    12%    1.89    5.50    2.91
>>> 96    0%    2.30    6.25    2.72
>>>
>>> So I would say we should restate our cracking goal from:
>>>
>>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>>> to the first 64 bits of keystream that get generated from that
>>> initial internal state. "
>>>
>>> to
>>>
>>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>>> _after the loading of Kc and the frame number_ to the first 64 bits
>>> of keystream that get generated from that initial internal state. "
>>>
>>>
>>
>> you mean:
>> "a mapping from the internal state of the A5/1 algorithm (64 bits)
>> that can occur after 100 initial rounds to the first ..."
>> Kc and frame number are clocked in regularly.
>> for those interested in details, the footnote says (actually not 100
>> but N, where N is 64 or 96 or 100 or sth).
>>
>>
> No. I mean the state from after Kc and frame number are clocked in.
>
> It's a semantical issue:
>
> - either you state the goal as you did, put the 100 clocking in the
> round function, and call the to-be reversed function the 64-bits
> directly generated by a certain state
>
> - or you state the goal as I did, and put the 100 clocking in the to- 
> be
> reversed function (and keep the round function untouched)
>
>
> But I realised that the efficiency gain is even bigger. If you find a
> hit in the current tables, you wind back the state 100 times and  
> find on
> average 1 ancestor state. If you'd find a hit in the new tables, you  
> can
> forward 100 times and back 100 times again, and find 13 sibling states
> on average, which would generate the exact same keystream.
>
> Some numbers, based on 100000 samples and clocking F/B or B only 100  
> times:
>
> #siblstates*    #F/B    #B  (*#siblstates is the number of siblings
> including the original state)
> 0    0    84664      (there were 0 states which when clocked
> forward/back 100 times have 0 sibling states / there are 84664 that
> can't be clocked back 100 times at all)
> 1    2766    2830  (there were 2766 states which when clocked
> forward/back 100 times have 1 sibling state / there are 2830 that when
> clocked back 100 times give 1 state)
> 2    3341    1704  etc etc
> 3    5165    1776
> 4    5501    1390
> 5    5525    1175
> 6    5487    961
> 7    5682    845
> 8    5482    699
> 9    5065    530
> 10    4927    511
> 11    4532    381
> 12    4229    365
> 13    3991    331
> 14    3633    264
> 15    3341    236
> ..
> 66    12    0
> 67    10    0
> 68    5    0
> 69    11    0
> 70    8    0
> 71    6    1
> 72    4    0
> 73    5    0
> 74    4    0
> 75    2    0
> 76    3    0
> 77    2    0
> 78    4    0
> 79    1    0
> 80    1    0
> 82    4    0
> 84    1    0
> 86    1    0
> 87    1    0
> 89    2    0
> 90    1    0
> 91    1    0
> 92    1    0
> 93    1    0
> 94    1    0
> 118    1    0
> WAVG:    13.09    1.00
>
> Note that the 1.00 follows from theory as well. These numbers are  
> pretty
> stable.
>
> So the tables are 13 times more effective. The cpu cost is 2.3 bigger,
> so the figure changes to 5.6 times more efficient. (but we may need  
> more
> colors in our rainbow - see below)
>>> i.e. take N=100. For implementation efficiency maybe better take a
>>> multiple of 8, so N=92.
>>>
>>
>> bitslicing doesn't care if its mod 8 or a prime and all of our  
>> (soon to
>> exist) implementations use bitslicing.
>>
>>
> It might be wise to keep the implementation friendly to those trying  
> to
> make lookup table versions. But 100 is a multiple of 4 so that might  
> be
> ok as well.
>
>>> When cracking, you should then of course still forward N times and
>>> then clockback again to find all sibling states leading to the same
>>> cipherstream.
>>>
>>>
>>
>> Ok, i am still a bit confused so i made a checklist and proposal  
>> for a consensus:
>>
>> 1) after loading the previous round keystream into our A5/1  
>> registers,
>>   we clock 100 times forward, then generate 64 bits of new keystream.
>>   Then the round function is applied and the loop continues.
>>
>>
> agreed. So the steps are
>
> loop {
> S = Sinitial
> S = S^round
> S = F(S,100)
> K = K(S,64)
> Sinitial = K
> }
>
> and for every chain we record the first Sinitial and the last K  
> generated.
>
>> 2) The whole keyspace shrinks to below 16% of 2^64
>>
>>
> depending on what you would call "key" - the initial state space is
> still 2^64. The 100-round state and thus the generated keystream space
> is 84% smaller, so that's a source of chain mergers if we don't use  
> the
> right round function. All states but the first in a chain are then
> member of a subset defined by the fact that we take 100 rounds and the
> round function. If we don't want to reduce our chances of finding a
> certain keystream at all with 84%, we have to make sure we have a  
> set of
> rounding functions that all together map this subset to the entire  
> 2^64
> spectrum. (So apart from the chain merger issue, we actually *need* at
> least 1/0.16 round functions to keep covering the entire 2^64 state
> space) The answer to the question I posted above, how one could
> determine what the subset looks like, could save us some headaches.
>
> (I had this idea to xor the previous state into the new key, so you
> would almost have no chain mergers. But this would also make lookup
> impossible...)
>
> Maybe we should add some colors as well - intuition says to get at
> 32*(1/16%)=200 total at least. (although the source of chain mergers
> until now might mostly be this state convergence as well, so we could
> live with 32 colors) This needs to be empirically determined. I  
> can't do
> those statistics here with no cuda card, especially since you would  
> need
> some 100000 samples to get stable numbers.
>
>> 3)    Therefore we need only compute 16% the amount of tables that  
>> we initially thought.
>>
> Or 1/13th, since lookups are 13 times as effective.
>
>> 3.5)  It takes 2 to 3 times longer to generate the tables and do  
>> lookup precomputation.
>>
> true.
>> 3.5a) Generation time is halved, storage requirement shrinks to 16%
>>
> to cover the complete keystream space, yes. To be as effective as now,
> storage shrinks to 1/13 or 8%.
>> 4) There is still a small chance that we find an illegal state in  
>> our database, as
>>   bits in the burst get encrypted with keystream that was produced  
>> not after 100,
>>   but 100 + N clockings
>>
> Clocking forward 100 times opens up 12 alternative pathways on  
> average,
> so there's ways around this. In 2.7% of cases (see above) you would  
> hit
> on the exact state you came from. In other cases there's a chance to  
> get
> around the illegal state. So this chance is small*small.
>
> With reference to the mail I sent yesterday, this is also a method to
> get even more states which in the end generate the same keystream:  
> clock
> forward 101 times, clock back 1 time into another path while  
> checking if
> the same bit of keystream had been generated. With roughly a 5% chance
> of finding more states leading to the same keystream. (and do the same
> clocking forward 100+N times for N=2..16 to get even more sibling  
> states)
>
> By the way: to verify if a state is in fact the state that led up to  
> the
> keystream, you would only have to generate the rest of the keystream  
> and
> match it against the actual keystream. The same holds for discarding
> streams after 100+N clockings, where you would match bits generated
> before the bits you've looked up. Considering the growth of ancestor
> states (some 5% per clock max) and the information in the keystream  
> (50%
> per clock) you can be sure to have the right Kc even before looking at
> another burst of the conversation.
>
> I didn't want to mess up the discussion too much before, but here's
> another one to consider: how about not clocking 100 forward but, say,
> 116. This would reduce our chances of finding a hit within the first
> 64+15 bits of keystream. But it also reduces our initial state space
> with 57%, i.e. we could lose one bit of data on every start value.
> That's 2% smaller tables. I don't think it is worth the mess it  
> creates
> when you would actually try and save this one bit. But I at least  
> wanted
> to share the thought, it might inspire other ideas.
>
>> 5) clocking 100 times is not optimal in terms of work per  
>> optimization but with a slight
>>   overhead the tables are cleaner (i suggest)
>>
> I agree. The end-to-end picture of what we're doing is somewhat easier
> to explain as well.
>
>
> I would like to repeat my request that anybody would redo these stats
> and confirm this is all ok.
>
> Also I think that it would be wise to put some of these findings and
> statistics in a central place (say, the wiki, to which I have no  
> access)
> for later reference and verification. There we could split it into 3  
> or
> more parts: statistics, table generation and cracking/reverse  
> clocking.
>
> Regards,
> M.
>
> _______________________________________________
> A51 mailing list
> [email protected]
> http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Re: [A51] 5.6x efficiency increase by removing illegal states

Reply via email to