Hi Marc,

Here are some answers:

1) MD5 was already a dependency of our oldcore and using SHA1 would have added 
a dependency without bringing much benefit. Since we only used 64 bits of the 
MD5 anyway, I doubt using SHA1 would have provided a better distribution.

2) Such a collision detection is difficult to be introduced in the existing 
code base, for some API it is even impossible. What you experience with the 
32-bit ids had been my motivation to the changes in 4.x and I could say, based 
on my long XWiki experience, that even with the poor 32 bit ids, very few users 
had been affected. Therefore, by improving the hash algorithm, the size of the 
ids, and the quality of the hashed key, we have considered ourselves to be 
saved enough for a normal usage.

3) That’s the worst point. I cannot answer about the first decision, I wasn’t 
yet involve, but regarding the changes introduced in 4.0,  a change had been 
considered. The ids are only there to satisfy Hibernate and its loading 
mechanism. If we had used a counter, we had to manage a conversion table 
between ids and entity references with all the additional complexity 
(consistency issues, caching, ...). This is so because we use entity reference 
to point directly to document (or even objects) everywhere in XWiki. This would 
have been a huge work to introduce that behaviour and at the same time keeping 
all the existing API unchanged. It would probably have introduced a performance 
penalty as well. This is why we resigned and go for an improved hash solution. 
IMO, if we had to make such a change, we are even better rewriting the storage 
service completely, and even stop using Hibernate, which, to be honest, does 
not bring much benefit to XWiki with its ORM aspects.

But if you really want the complete answers, you can look at those threads:
http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
http://xwiki.markmail.org/thread/fsd25bvft74xwgcx

Regards,

--
Denis Gervalle
SOFTEC sa - CEO

On 30 Nov 2017, 14:14 +0100, Marc Sladek <[email protected]>, wrote:
> Dear XWiki devs
>
> We are using the XWiki platform for our applications but sadly are still
> stuck with 2.7.2. Lately we ran into issues on a large database and noticed
> "disappearing" BaseObjects. We were able to link it to XWIKI-6990
> <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs collided
> (hash collisions) and overwrote other objects without any trace - neither
> visible in the history nor in a log file.
>
> We analysed your implemented solution from 4.0+ in XWikiDocument
> <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/doc/XWikiDocument.java#L841
> and BaseElement
> <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/xwiki-platform-core/xwiki-platform-oldcore/src/main/java/com/xpn/xwiki/objects/BaseElement.java#L237
> and
> noticed that you changed the 32bit String#hashCode to 64bit MD5, which
> makes a collision less likely. I have a few questions regarding your
> solution:
>
> 1) Is there any specific reason why you have chosen MD5 over SHA-1 or 2?
>
> 2) Collisions are still possible and would be extremely hard to notice
> since they are completely silent. Have you considered to implement a
> collision detection to at least log occurring collisions - or even better
> reserve 1-2bits of the 64bit to be used as collision counter in the case of
> it happening?
>
> 3) To question the concept of generating a hash for an ID in general:
> Wouldn't a database defined "auto increment" be a much more robust solution
> for the hibernate IDs? A collision would be impossible and
> clustering/scalability is still possible with e.g. the InnoDB “interleaved”
> autoincrement lock mode. Why have you chosen a hash based solution in the
> first place?
>
> I'm sorry if these questions were already answered in the dev mailing list
> or on issues, please link me to them since I couldn't find any concrete
> answers.
>
> Thanks for your time and regards
>
> Marc Sladek
> synventis gmbh

Reply via email to