On Wed, 31 May 2023 13:11:08 GMT, Roger Riggs <rri...@openjdk.org> wrote:

>> This is a non-practical concern, IMO. By spec, `UUID.randomUUID` is 
>> generated from the cryptographically secure random, with >120 bits of 
>> randomness, so the collision is extremely unlikely. Collision math involves 
>> birthday paradox, but Wikipedia article on UUIDs fortunately gives us the 
>> approximated solutions already:  
>> https://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions 
>> 
>> Quote: "Thus, the probability to find a duplicate within 103 trillion 
>> version-4 UUIDs is one in a billion." 
>> 
>> In other words, finding a collision in this test with 1M UUIDs points to the 
>> implementation issue, not a test bug, with a very high probability. In yet 
>> another words, if a unit test with 1M UUIDs is able to find a collision, 
>> then this is a strong signal that many production systems that assume 
>> extremely low collision probability are up for subtle misbehavior.
>
> My point was that its probably not practical to test (more than once).  
> If it fails, it will be considered just as you propose and disregarded and in 
> the meantime consumes test cycles in each of the test contexts. Either 
> provide more information about the conditions under which it failed or remove 
> it.

Sorry, I have trouble following the argument here.

Let me re-iterate: the probability for bona-fide collision is so vanishingly 
low, the test failure here is a strong signal that something is wrong with the 
implementation. We can put more guidance in the test comments there, like "This 
is extremely unlikely to happen. If you see this failing, this highly likely 
points to the implementation bug, rather than the odd chance."

What I expect to happen when that test fails, is that it prompts the 
investigation with multiple stress tests to get a better estimate of the actual 
collision rate. Assuming we actually see a collision, it is likely to be caused 
by much higher probability error somewhere in the code. In fact, if this test 
is _actually noisy_ to the point it becomes a testing problem, this already 
gives us the signal that actual collision rate is many orders of magnitude 
higher than math predicts, and this becomes an even _stronger_ signal that 
random UUIDs are seriously broken for practical use.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/14134#discussion_r1211776452

Reply via email to