As I said it's not a perfect solution but while waiting for one that does not reduce a lot the performances it cost nothing and probably covers most existing use cases.
On Wed, Feb 7, 2018 at 10:55 AM, Marc Sladek <[email protected]> wrote: > Ok, I understand now and agree, thanks for clarifying. Two caveats though: > > 1) As you mentioned, it applies only to document hash collisions. I'd > however expect object hash collisions to be more frequent because of a > higher object than document count in the average database. > > 2) It's not guaranteed that all projects using the XWiki platform call > getDocument before saveDocument on the same instance. For example our CMS > 'celements' uses an XWikiDocumentCreator > <https://github.com/celements/celements-xwiki/blob/dev/celements-model/src/main/java/com/celements/model/access/DefaultXWikiDocumentCreator.java#L23> > to build XWikiDocuments from scratch without a getDocument call beforehand. > It's possible that there is more code around creating documents this way, > they would not be covered by your proposed change. > > > On 2 February 2018 at 16:14, Thomas Mortagne <[email protected]> > wrote: > >> By "most of the code" I mean most of the code I know :) >> >> AFAIK everything in XWiki platform and in most extensions is doing >> this and it includes of course everything you do trough the standard >> UI. >> >> On Fri, Feb 2, 2018 at 3:51 PM, Marc Sladek <[email protected]> >> wrote: >> > With "most of the code", do you also mean when new documents are being >> > created and stored (the scenario where collisions happen)? >> > >> > Imho there is nothing preventing collisions when doing: >> > * new XWikiDocument(docRef) >> > * modify instance >> > * exists check and save >> > >> > On 2 February 2018 at 15:28, Thomas Mortagne <[email protected]> >> > wrote: >> > >> >> On Fri, Feb 2, 2018 at 3:07 PM, Marc Sladek <[email protected]> >> >> wrote: >> >> > Introducing this certainly doesn't hurt, but 'm not sure how useful it >> >> is. >> >> >> >> I never said it was the solution to all collisions but it will cover >> >> most of the (very rare and never reported) overwrites at 0 cost. >> >> >> >> > Firstly, it can show collisions only after a document is already >> >> > overwritten, thus the damage is already done. >> >> >> >> Again keep in mind that most of the code does: >> >> * getDocument(DocumentReference) >> >> * modify the XWikiDocument instance >> >> * saveDocument() >> >> >> >> so if getDocument() fail you are not going to overwrite anything. >> >> >> >> > Secondly, loadXWikiDoc has to >> >> > be called for the document which doesn't exist anymore, I guess this >> >> > doesn't happen so often since the system won't list it anymore. >> >> >> >> Not sure which use case you are referring to here. Are you talking >> >> about document deleted the a document with a different reference but >> >> same hash is saved ? >> >> >> >> > >> >> > On 2 February 2018 at 13:36, Thomas Mortagne < >> [email protected]> >> >> > wrote: >> >> > >> >> >> For document what could help a lot already without any performance >> >> >> penalty is to compare the loaded document reference and the passed >> one >> >> >> in XWikiHibernateStore#loadXWikiDoc. That's because most of the code >> >> >> in XWiki apply the following logic: getDocument(), modify it, >> >> >> saveDocument(). >> >> >> >> >> >> On Fri, Feb 2, 2018 at 12:19 PM, Marc Sladek < >> [email protected] >> >> > >> >> >> wrote: >> >> >> > Hi Denis, >> >> >> > >> >> >> > Thanks a lot for your answer. I know it's been a while, but I'd >> still >> >> >> like >> >> >> > to follow up on it since it's quite the fundamental issue. >> >> >> > >> >> >> >> Therefore, by improving the hash algorithm, the size of the ids, >> and >> >> the >> >> >> > quality of the hashed key, we have considered ourselves to be saved >> >> >> enough >> >> >> > for a normal usage. >> >> >> > >> >> >> > Still, with enough bad luck, documents and objects may be >> overwritten >> >> >> > without a trace. This is not a stable implementation. And even >> worse, >> >> if >> >> >> on >> >> >> > any XWiki installation hash collisions will happen in the future >> (or >> >> have >> >> >> > already happened since 4.x), they probably won't be easily >> associated >> >> >> with >> >> >> > this issue because it's nearly impossible to debug. >> >> >> > >> >> >> > While I do now understand the motivation to stick with hashes, I'm >> >> still >> >> >> > not sure why a collision detection would be difficult to introduce >> and >> >> >> why >> >> >> > it's even "impossible for some API". Let me briefly outline an >> idea: >> >> >> > >> >> >> > In XWikiHibernateStore#saveXWikiDoc on L615 >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/ >> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/ >> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L615> >> >> >> > an exists check on the doc id is already performed. If now >> >> >> > xwikidoc.fullName is also selected in the HQL, a comparison to >> >> >> > doc.getDocumentReference() can expose an imminent collision before >> >> data >> >> >> is >> >> >> > overwritten. At least an XWikiException should be thrown in this >> >> case. A >> >> >> > similar thing could be done before saving BaseObjects on L1203 >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-9.11.x/ >> >> >> xwiki-platform-core/xwiki-platform-oldcore/src/main/java/ >> >> >> com/xpn/xwiki/store/XWikiHibernateStore.java#L1203> >> >> >> > to avoid collisions on Object IDs. >> >> >> > >> >> >> > I don't think a change like this would be difficult to implement, I >> >> could >> >> >> > provide a PR of that sort. The performance penalty has to be tested >> >> for >> >> >> > your systems though, since the full name isn't indexed afaik. >> >> >> > >> >> >> > Regards >> >> >> > >> >> >> > Marc Sladek >> >> >> > synventis gmbh >> >> >> > >> >> >> > On 30 November 2017 at 15:21, Denis Gervalle <[email protected]> >> wrote: >> >> >> > >> >> >> >> Hi Marc, >> >> >> >> >> >> >> >> Here are some answers: >> >> >> >> >> >> >> >> 1) MD5 was already a dependency of our oldcore and using SHA1 >> would >> >> have >> >> >> >> added a dependency without bringing much benefit. Since we only >> used >> >> 64 >> >> >> >> bits of the MD5 anyway, I doubt using SHA1 would have provided a >> >> better >> >> >> >> distribution. >> >> >> >> >> >> >> >> 2) Such a collision detection is difficult to be introduced in the >> >> >> >> existing code base, for some API it is even impossible. What you >> >> >> experience >> >> >> >> with the 32-bit ids had been my motivation to the changes in 4.x >> and >> >> I >> >> >> >> could say, based on my long XWiki experience, that even with the >> poor >> >> >> >> 32 bit ids, very few users had been affected. Therefore, by >> improving >> >> >> the >> >> >> >> hash algorithm, the size of the ids, and the quality of the hashed >> >> key, >> >> >> we >> >> >> >> have considered ourselves to be saved enough for a normal usage. >> >> >> >> >> >> >> >> 3) That’s the worst point. I cannot answer about the first >> decision, >> >> I >> >> >> >> wasn’t yet involve, but regarding the changes introduced in 4.0, >> a >> >> >> change >> >> >> >> had been considered. The ids are only there to satisfy Hibernate >> and >> >> its >> >> >> >> loading mechanism. If we had used a counter, we had to manage a >> >> >> conversion >> >> >> >> table between ids and entity references with all the additional >> >> >> complexity >> >> >> >> (consistency issues, caching, ...). This is so because we use >> entity >> >> >> >> reference to point directly to document (or even objects) >> everywhere >> >> in >> >> >> >> XWiki. This would have been a huge work to introduce that >> behaviour >> >> and >> >> >> at >> >> >> >> the same time keeping all the existing API unchanged. It would >> >> probably >> >> >> >> have introduced a performance penalty as well. This is why we >> >> resigned >> >> >> and >> >> >> >> go for an improved hash solution. IMO, if we had to make such a >> >> change, >> >> >> we >> >> >> >> are even better rewriting the storage service completely, and even >> >> stop >> >> >> >> using Hibernate, which, to be honest, does not bring much benefit >> to >> >> >> >> XWiki with its ORM aspects. >> >> >> >> >> >> >> >> But if you really want the complete answers, you can look at those >> >> >> threads: >> >> >> >> http://xwiki.markmail.org/thread/fuprtrnupz2uy37f >> >> >> >> http://xwiki.markmail.org/thread/fsd25bvft74xwgcx >> >> >> >> >> >> >> >> Regards, >> >> >> >> >> >> >> >> -- >> >> >> >> Denis Gervalle >> >> >> >> SOFTEC sa - CEO >> >> >> >> >> >> >> >> On 30 Nov 2017, 14:14 +0100, Marc Sladek < >> [email protected] >> >> >, >> >> >> >> wrote: >> >> >> >> > Dear XWiki devs >> >> >> >> > >> >> >> >> > We are using the XWiki platform for our applications but sadly >> are >> >> >> still >> >> >> >> > stuck with 2.7.2. Lately we ran into issues on a large database >> and >> >> >> >> noticed >> >> >> >> > "disappearing" BaseObjects. We were able to link it to >> XWIKI-6990 >> >> >> >> > <http://jira.xwiki.org/browse/XWIKI-6990>, where hibernate IDs >> >> >> collided >> >> >> >> > (hash collisions) and overwrote other objects without any trace >> - >> >> >> neither >> >> >> >> > visible in the history nor in a log file. >> >> >> >> > >> >> >> >> > We analysed your implemented solution from 4.0+ in XWikiDocument >> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x >> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/ >> >> >> >> xpn/xwiki/doc/XWikiDocument.java#L841 >> >> >> >> > and BaseElement >> >> >> >> > <https://github.com/xwiki/xwiki-platform/blob/stable-8.4.x/x >> >> >> >> wiki-platform-core/xwiki-platform-oldcore/src/main/java/com/ >> >> >> >> xpn/xwiki/objects/BaseElement.java#L237 >> >> >> >> > and >> >> >> >> > noticed that you changed the 32bit String#hashCode to 64bit MD5, >> >> which >> >> >> >> > makes a collision less likely. I have a few questions regarding >> >> your >> >> >> >> > solution: >> >> >> >> > >> >> >> >> > 1) Is there any specific reason why you have chosen MD5 over >> SHA-1 >> >> or >> >> >> 2? >> >> >> >> > >> >> >> >> > 2) Collisions are still possible and would be extremely hard to >> >> notice >> >> >> >> > since they are completely silent. Have you considered to >> implement >> >> a >> >> >> >> > collision detection to at least log occurring collisions - or >> even >> >> >> better >> >> >> >> > reserve 1-2bits of the 64bit to be used as collision counter in >> the >> >> >> case >> >> >> >> of >> >> >> >> > it happening? >> >> >> >> > >> >> >> >> > 3) To question the concept of generating a hash for an ID in >> >> general: >> >> >> >> > Wouldn't a database defined "auto increment" be a much more >> robust >> >> >> >> solution >> >> >> >> > for the hibernate IDs? A collision would be impossible and >> >> >> >> > clustering/scalability is still possible with e.g. the InnoDB >> >> >> >> “interleaved” >> >> >> >> > autoincrement lock mode. Why have you chosen a hash based >> solution >> >> in >> >> >> the >> >> >> >> > first place? >> >> >> >> > >> >> >> >> > I'm sorry if these questions were already answered in the dev >> >> mailing >> >> >> >> list >> >> >> >> > or on issues, please link me to them since I couldn't find any >> >> >> concrete >> >> >> >> > answers. >> >> >> >> > >> >> >> >> > Thanks for your time and regards >> >> >> >> > >> >> >> >> > Marc Sladek >> >> >> >> > synventis gmbh >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Thomas Mortagne >> >> >> >> >> >> >> >> >> >> >> -- >> >> Thomas Mortagne >> >> >> >> >> >> -- >> Thomas Mortagne >> -- Thomas Mortagne

