Re: [xwiki-devs] BaseObject ID collisions

Thomas Mortagne Wed, 07 Feb 2018 07:11:08 -0800

As I said it's not a perfect solution but while waiting for one that
does not reduce a lot the performances it cost nothing and probably
covers most existing use cases.


On Wed, Feb 7, 2018 at 10:55 AM, Marc Sladek <[email protected]> wrote:
> Ok, I understand now and agree, thanks for clarifying. Two caveats though:
>
> 1) As you mentioned, it applies only to document hash collisions. I'd
> however expect object hash collisions to be more frequent because of a
> higher object than document count in the average database.
>
> 2) It's not guaranteed that all projects using the XWiki platform call
> getDocument before saveDocument on the same instance. For example our CMS
> 'celements' uses an XWikiDocumentCreator
> <https://github.com/celements/celements-xwiki/blob/dev/celements-model/src/main/java/com/celements/model/access/DefaultXWikiDocumentCreator.java#L23>
> to build XWikiDocuments from scratch without a getDocument call beforehand.
> It's possible that there is more code around creating documents this way,
> they would not be covered by your proposed change.
>
>
> On 2 February 2018 at 16:14, Thomas Mortagne <[email protected]>
> wrote:
>
>> By "most of the code" I mean most of the code I know :)
>>
>> AFAIK everything in XWiki platform and in most extensions is doing
>> this and it includes of course everything you do trough the standard
>> UI.
>>
>> On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <[email protected]>
>> wrote:
>> > With "most of the code", do you also mean when new documents are being
>> > created and stored (the scenario where collisions happen)?
>> >
>> > Imho there is nothing preventing collisions when doing:
>> > * new XWikiDocument(docRef)
>> > * modify instance
>> > * exists check and save
>> >
>> > On 2 February 2018 at 15:28, Thomas Mortagne <[email protected]>
>> > wrote:
>> >
>> >> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[email protected]>
>> >> wrote:
>> >> > Introducing this certainly doesn't hurt, but 'm not sure how useful it
>> >> is.
>> >>
>> >> I never said it was the solution to all collisions but it will cover
>> >> most of the (very rare and never reported) overwrites at 0 cost.
>> >>
>> >> > Firstly, it can show collisions only after a document is already
>> >> > overwritten, thus the damage is already done.
>> >>
>> >> Again keep in mind that most of the code does:
>> >> * getDocument(DocumentReference)
>> >> * modify the XWikiDocument instance
>> >> * saveDocument()
>> >>
>> >> so if getDocument() fail you are not going to overwrite anything.
>> >>
>> >> > Secondly, loadXWikiDoc has to
>> >> > be called for the document which doesn't exist anymore, I guess this
>> >> > doesn't happen so often since the system won't list it anymore.
>> >>
>> >> Not sure which use case you are referring to here. Are you talking
>> >> about document deleted the a document with a different reference but
>> >> same hash is saved ?
>> >>
>> >> >
>> >> > On 2 February 2018 at 13:36, Thomas Mortagne <
>> [email protected]>
>> >> > wrote:
>> >> >
>> >> >> For document what could help a lot already without any performance
>> >> >> penalty is to compare the loaded document reference and the passed
>> one
>> >> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code
>> >> >> in XWiki apply the following logic: getDocument(), modify it,
>> >> >> saveDocument().
>> >> >>
>> >> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek <
>> [email protected]
>> >> >
>> >> >> wrote:
>> >> >> > Hi Denis,
>> >> >> >
>> >> >> > Thanks a lot for your answer. I know it's been a while, but I'd
>> still
>> >> >> like
>> >> >> > to follow up on it since it's quite the fundamental issue.
>> >> >> >
>> >> >> >> Therefore, by improving the hash algorithm, the size of the ids,
>> and
>> >> the
>> >> >> > quality of the hashed key, we have considered ourselves to be saved
>> >> >> enough
>> >> >> > for a normal usage.
>> >> >> >
>> >> >> > Still, with enough bad luck, documents and objects may be
>> overwritten
>> >> >> > without a trace. This is not a stable implementation. And even
>> worse,
>> >> if
>> >> >> on
>> >> >> > any XWiki installation hash collisions will happen in the future
>> (or
>> >> have
>> >> >> > already happened since 4.x), they probably won't be easily
>> associated
>> >> >> with
>> >> >> > this issue because it's nearly impossible to debug.
>> >> >> >
>> >> >> > While I do now understand the motivation to stick with hashes, I'm
>> >> still
>> >> >> > not sure why a collision detection would be difficult to introduce
>> and
>> >> >> why
>> >> >> > it's even "impossible for some API". Let me briefly outline an
>> idea:
>> >> >> >
>> >> >> > In XWikiHibernateStore#saveXWikiDoc on L615
>> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615>
>> >> >> > an exists check on the doc id is already performed. If now
>> >> >> > xwikidoc.fullName is also selected in the HQL, a comparison to
>> >> >> > doc.getDocumentReference() can expose an imminent collision before
>> >> data
>> >> >> is
>> >> >> > overwritten. At least an XWikiException should be thrown in this
>> >> case. A
>> >> >> > similar thing could be done before saving BaseObjects on L1203
>> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/
>> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/
>> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203>
>> >> >> > to avoid collisions on Object IDs.
>> >> >> >
>> >> >> > I don't think a change like this would be difficult to implement, I
>> >> could
>> >> >> > provide a PR of that sort. The performance penalty has to be tested
>> >> for
>> >> >> > your systems though, since the full name isn't indexed afaik.
>> >> >> >
>> >> >> > Regards
>> >> >> >
>> >> >> > Marc Sladek
>> >> >> > synventis gmbh
>> >> >> >
>> >> >> > On 30 November 2017 at 15:21, Denis Gervalle <[email protected]>
>> wrote:
>> >> >> >
>> >> >> >> Hi Marc,
>> >> >> >>
>> >> >> >> Here are some answers:
>> >> >> >>
>> >> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1
>> would
>> >> have
>> >> >> >> added a dependency without bringing much benefit. Since we only
>> used
>> >> 64
>> >> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a
>> >> better
>> >> >> >> distribution.
>> >> >> >>
>> >> >> >> 2) Such a collision detection is difficult to be introduced in the
>> >> >> >> existing code base, for some API it is even impossible. What you
>> >> >> experience
>> >> >> >> with the 32-bit ids had been my motivation to the changes in 4.x
>> and
>> >> I
>> >> >> >> could say, based on my long XWiki experience, that even with the
>> poor
>> >> >> >> 32 bit ids, very few users had been affected. Therefore, by
>> improving
>> >> >> the
>> >> >> >> hash algorithm, the size of the ids, and the quality of the hashed
>> >> key,
>> >> >> we
>> >> >> >> have considered ourselves to be saved enough for a normal usage.
>> >> >> >>
>> >> >> >> 3) That’s the worst point. I cannot answer about the first
>> decision,
>> >> I
>> >> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0,
>> a
>> >> >> change
>> >> >> >> had been considered. The ids are only there to satisfy Hibernate
>> and
>> >> its
>> >> >> >> loading mechanism. If we had used a counter, we had to manage a
>> >> >> conversion
>> >> >> >> table between ids and entity references with all the additional
>> >> >> complexity
>> >> >> >> (consistency issues, caching, ...). This is so because we use
>> entity
>> >> >> >> reference to point directly to document (or even objects)
>> everywhere
>> >> in
>> >> >> >> XWiki. This would have been a huge work to introduce that
>> behaviour
>> >> and
>> >> >> at
>> >> >> >> the same time keeping all the existing API unchanged. It would
>> >> probably
>> >> >> >> have introduced a performance penalty as well. This is why we
>> >> resigned
>> >> >> and
>> >> >> >> go for an improved hash solution. IMO, if we had to make such a
>> >> change,
>> >> >> we
>> >> >> >> are even better rewriting the storage service completely, and even
>> >> stop
>> >> >> >> using Hibernate, which, to be honest, does not bring much benefit
>> to
>> >> >> >> XWiki with its ORM aspects.
>> >> >> >>
>> >> >> >> But if you really want the complete answers, you can look at those
>> >> >> threads:
>> >> >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f
>> >> >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx
>> >> >> >>
>> >> >> >> Regards,
>> >> >> >>
>> >> >> >> --
>> >> >> >> Denis Gervalle
>> >> >> >> SOFTEC sa - CEO
>> >> >> >>
>> >> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek <
>> [email protected]
>> >> >,
>> >> >> >> wrote:
>> >> >> >> > Dear XWiki devs
>> >> >> >> >
>> >> >> >> > We are using the XWiki platform for our applications but sadly
>> are
>> >> >> still
>> >> >> >> > stuck with 2.7.2. Lately we ran into issues on a large database
>> and
>> >> >> >> noticed
>> >> >> >> > "disappearing" BaseObjects. We were able to link it to
>> XWIKI-6990
>> >> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs
>> >> >> collided
>> >> >> >> > (hash collisions) and overwrote other objects without any trace
>> -
>> >> >> neither
>> >> >> >> > visible in the history nor in a log file.
>> >> >> >> >
>> >> >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument
>> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> >> >> xpn/xwiki/doc/XWikiDocument.java#L841
>> >> >> >> > and BaseElement
>> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x
>> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/
>> >> >> >> xpn/xwiki/objects/BaseElement.java#L237
>> >> >> >> > and
>> >> >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5,
>> >> which
>> >> >> >> > makes a collision less likely. I have a few questions regarding
>> >> your
>> >> >> >> > solution:
>> >> >> >> >
>> >> >> >> > 1) Is there any specific reason why you have chosen MD5 over
>> SHA-1
>> >> or
>> >> >> 2?
>> >> >> >> >
>> >> >> >> > 2) Collisions are still possible and would be extremely hard to
>> >> notice
>> >> >> >> > since they are completely silent. Have you considered to
>> implement
>> >> a
>> >> >> >> > collision detection to at least log occurring collisions - or
>> even
>> >> >> better
>> >> >> >> > reserve 1-2bits of the 64bit to be used as collision counter in
>> the
>> >> >> case
>> >> >> >> of
>> >> >> >> > it happening?
>> >> >> >> >
>> >> >> >> > 3) To question the concept of generating a hash for an ID in
>> >> general:
>> >> >> >> > Wouldn't a database defined "auto increment" be a much more
>> robust
>> >> >> >> solution
>> >> >> >> > for the hibernate IDs? A collision would be impossible and
>> >> >> >> > clustering/scalability is still possible with e.g. the InnoDB
>> >> >> >> “interleaved”
>> >> >> >> > autoincrement lock mode. Why have you chosen a hash based
>> solution
>> >> in
>> >> >> the
>> >> >> >> > first place?
>> >> >> >> >
>> >> >> >> > I'm sorry if these questions were already answered in the dev
>> >> mailing
>> >> >> >> list
>> >> >> >> > or on issues, please link me to them since I couldn't find any
>> >> >> concrete
>> >> >> >> > answers.
>> >> >> >> >
>> >> >> >> > Thanks for your time and regards
>> >> >> >> >
>> >> >> >> > Marc Sladek
>> >> >> >> > synventis gmbh
>> >> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> Thomas Mortagne
>> >> >>
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Mortagne
>> >>
>>
>>
>>
>> --
>> Thomas Mortagne
>>



-- 
Thomas Mortagne

Re: [xwiki-devs] BaseObject ID collisions

Reply via email to