Hi Denis I personaly think, that this is not a discussion about propabilities but about fail-safe and robust implementation.
I understand that you do not want to implement a logic to the id generation which can mitigate collisions (e.g. collision-bit), because the probability that this logic ever solves a problem is "low" and the code must be maintained. I agree that this is a question about probabilities if this feature is useful and necessary. Yet, I think if anybody hits a collision, which still may happen with enough bad luck, than this person wants the system to behave robust and fail-safe and that it never ever corrupts stored data. IMHO this is therefore not a question of probabilites but a question of fail-safe implementation and thus a question of software quality. Only my 5cents. Regards, Fabian 2018-02-08 9:24 GMT+01:00 Denis Gervalle <[email protected]>: > Hi Marc and Thomas, > > I followed your discussion with great interest. I agree that Thomas very > light proposal is good to put in place, since it has almost no negative > impact and only benefit. I think there is also a possibility to mitigate > the object issue with something close (check integrity of what we get, to > at least detect an issue), but that's not perfect of course. > > That’s said, I would like to point you to this interesting question on > StackOverflow (https://stackoverflow.com/questions/22029012/ > probability-of-64bit-hash-code-collisions) and remind you that base on > the Birthday Paradox, with the released of 4.x, we have raised our worrying > threshold of documents/objects from 65535, to more than 4 billion… and it > took a while (4 versions of XWiki) before we had the strong feeling we need > to raise. So, while before 4.x, the worrying threshold was really low, the > effective happening of a collision was already low. > > My own experience was the risk before 4.x was really high with generated > names, much hight than with names use by real user. When I was it by that > issue, I remember being really bad about it. This is also probably why you > have raised this thread. The previous hash was too small and had also a > discutable distribution. > > The MD5 algorithm like many crypto hashes is particularly well suited for > providing a good distribution (http://michiel.buddingh.eu/ > distribution-of-hash-values), the cutting at 64 bits may lower this, but > I doubt it would be significant for us. So, personally, I feel really > comfortable with the current implementation, and I think you can sleep in > peace as well. > > Just my thought about not raising fears when it’s no more really justified. > Regards, > > -- > Denis Gervalle > SOFTEC sa - CEO > > On 7 Feb 2018, 16:10 +0100, Denis Gervalle <[email protected]>, > wrote: > > > > Hi Marc and Thomas, > > > > I followed your discussion with great interest. I agree that Thomas very > light proposal is good to put in place, since it has almost no negative > impact and only benefit. I think there is also a possibility to mitigate > the object issue with something close (check integrity of what we get, to > at least detect an issue), but that's not perfect of course. > > > > That’s said, I would like to point you to this interesting question on > StackOverflow (https://stackoverflow.com/questions/22029012/ > probability-of-64bit-hash-code-collisions) and remind you that base on > the Birthday Paradox, with the released of 4.x, we have raised our worrying > threshold of documents/objects from 65535, to more than 4 billion… and it > took a while (4 versions of XWiki) before we had the strong feeling we need > to raise. So, while before 4.x, the worrying threshold was really low, the > effective happening of a collision was already low. > > > > My own experience was the risk before 4.x was really high with generated > names, much hight than with names use by real user. When I was it by that > issue, I remember being really bad about it. This is also probably why you > have raised this thread. The previous hash was too small and had also a > discutable distribution. > > > > The MD5 algorithm like many crypto hashes is particularly well suited > for providing a good distribution (http://michiel.buddingh.eu/ > distribution-of-hash-values), the cutting at 64 bits may lower this, but > I doubt it would be significant for us. So, personally, I feel really > comfortable with the current implementation, and I think you can sleep in > peace as well. > > > > Just my thought about not raising fears when it’s no more really > justified. > > Regards, >

