Yes, steps (2, 4, 5) will take locks as you described. Any number of concurrent 
requests could get to step #3 before the write locks have to be taken.

I believe URIs like {$id}-{$version} would be safe because any threads that 
don't get the write locks will restart from the beginning, rather than resuming 
at step #4 or #5. By restarting they would see the latest updates.

However I think you're right that it's better to use {$id}-{$version} in 
archived versions and {$id} alone in the live version. That makes the update 
less dependent on versioning logic. Perhaps more importantly it optimizes 
getting the latest version by id: doc($uri) instead of a search by id and 
collection. Adding a new version means overwriting {$id}-latest and copying the 
old XML to {$id}-{$previous-version}. However I think update speed will remain 
about the same, because you're already doing two updates: removing the 'live' 
collection from {$id}-{$previous-version} and writing to 
{$id}-{$current-version}.

Better still, use something like {$id}/{$version} and {$id}/current. That gives 
you the advantages described above, and also gives you the ability to query all 
versions for an id using cts:directory-query.

-- Mike

On 1 Jul 2014, at 07:53 , Retter, Adam (RBI-UK) <[email protected]> wrote:

> Hi Michael,
> 
> Thanks for your reply. I guess I am still missing something as it is not 
> clear to me how encoding both the id and version into the file URI would help 
> me? I could understand if I was just encoding the id as that will not change 
> over time, however for each request we are potentially writing a different 
> version.
> 
> Example 1
> ========
> For example, if I follow your suggestion of encoding the id and version into 
> the document URI. Let us assume that a document already exists in the 
> database with id=1234 and version=1, therefore the URI is /1234-v1.xml:
> 
> XQuery Thread
> -------------------
> 0) Set transaction mode to 'updating'
> 1) XQuery REST Endpoint - receives an XML document over HTTP POST which is 
> id=1234
> 2) Searches the database for an existing document, which contains an id 
> element with value 1234, and a version element with value 2. It finds the 
> document /1234-v1.xml. (I assume this causes the query transaction to take a 
> READ lock on the document URI /1234-v1.xml?)
> 3) It compares the version of the two documents. The posted document has a 
> new version so it continues.
> 4) Removes the collection 'live' from the document /1234-v1.xml. (Does this 
> take a WRITE lock on /1234-v1.xml?)
> 5) Inserts the posted content into the document /1234-v2.xml into the 
> database and add's it to the 'live' collection. (Does this take a WRITE lock 
> on /1234-v2.xml?)
> 6) Call xdmp:commit (Presumably all READ and WRITE locks are released here?)
> 
> If I have more than one of these threads executing in parallel, it seems to 
> me that through thread pre-emption it is still possible for more than one 
> thread to get to at least complete to  the end of (3) before any sort of lock 
> contention occurs. Imagining there are just two threads in parallel for the 
> moment, I think that means that when the first thread to acquire the lock 
> releases the lock in (6), then the other thread will continue through (4) - 
> (6), is that correct? If so that leads to a different class of errors: a) if 
> both posted documents that initiated the threads both have version=2, then 
> yes I cannot generate a duplicate in the database, as the second thread to 
> complete with overwrite the v2 document of the first thread, but which was 
> meant to be the correct v2? b) If both posted documents have different 
> versions but greater than version=1, then I  may end up with both version=2 
> and version=3 documents in the live collection.
> 
> If I understand you correctly and my assumptions above are correct, then to 
> a) prevent inserting the same version and id, and to b) also prevent 
> inserting the same id and different versions, we would need to re-design our 
> document URI scheme to *just* include the id of the document and *not* the 
> version. Is that correct?
> 
> As you suggested I was considering using xdmp:lock-for-update. Introducing 
> this between steps (1) and steps(2) of the above and taking the lock on the 
> id of our record (i.e. ignoring the version) does indeed seem to fix our 
> issues. Thank you very much for your guidance Mike. If you have any comments 
> or clarifications on what I have written and my assumptions, I would be glad 
> to hear from you further...
> 
> 
> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of Michael Blakeley
> Sent: 30 June 2014 18:47
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Locking and Transactions in REST 
> read+update
> 
> MarkLogic automatically locks document URIs as necessary. The goal is to 
> design your document URIs to enforce whatever constraints you need.
> 
> The best way to avoid a conflict is to build the version into the document 
> URI, as well as having it in the XML. If your URI is something like 
> /{$id}/{$version} then concurrent attempts to insert the same id and version 
> will try to lock the same URI. One of them will win, and the other will 
> retry. This also means step #2 in your process is as simple as 
> exists(doc($uri)) - but not xdmp:exists, because that function won't 
> read-lock the URI.
> 
> If for some reason you can't build the id and version into your URIs, fake it 
> with an intent lock. Use whatever real URI you like, but in the same insert 
> code construct a fake URI with the id and version, and call 
> https://docs.marklogic.com/xdmp:lock-for-update to lock that fake URI 
> explicitly. Again any concurrent requests will have to resolve the conflict, 
> and one will win. You'll still have to check for existing versions in step 
> #2, but at least you'll have a write lock on the id and version.
> 
> Note that conflict resolution can be bad for performance. It's best to design 
> your ingestion process such that conflicts will be rare. Having that step #2 
> helps, but this is another reason to prefer a real id-version URI over an 
> intent lock.
> 
> -- Mike
> 
> On 30 Jun 2014, at 09:33 , Retter, Adam (RBI-UK) <[email protected]> 
> wrote:
> 
>> We have what I consider to be an interesting issue with an XQuery that is 
>> run as a REST endpoint, basically we have at least two race-conditions that 
>> we have identified. Typically I would fix these by enforcing something like 
>> a Critical Section in the code through appropriate locking. 
>> Unfortunately after lots of head scratching and re-reading of documentation 
>> I cannot at the moment see how to solve this with the facilities provided in 
>> MarkLogic and am looking for some guidance. I guess this is a common issue 
>> that others must have solved before, so I am most likely missing something 
>> obvious!
>> 
>> Our REST endpoint effectively does the following:
>> 
>> 1) XQuery REST Endpoint - receives an XML document over HTTP POST. Let's 
>> call this document B.
>> 2) Searches the database for an existing document, which has an <id> element 
>> with the same value as that in document B. Assuming we find a document, let 
>> us call that document A.
>> 3) Check the version of document B against document A. The version is 
>> indicated in a <version> element in each document respectively. The version 
>> of document B should be newer than document A, if not then stop, else 
>> continue.
>> 4) Remove document A from the 'live' collection
>> 5) Insert document B into the database and add it to the 'live' collection.
>> 
>> Now this REST end-point may be called by many clients in parallel, which 
>> means not just adding the new document B, but in parallel running the above 
>> query for document C, D, E ... nN. I think we are seeing three separate race 
>> conditions appearing:
>> 
>> i) Steps (4) and (5) where the same version of the document with the same id 
>> can be inserted into the live collection. Typically step (4) tries to ensure 
>> there is only one live version by removing the old document (document A) 
>> from the live collection, before adding the new document (document B) to the 
>> live collection.
>> 
>> ii) Steps (3) and (5) where multiple versions can be inserted into the live 
>> collection.
>> 
>> iii) Steps (3) and (5) where sometimes an older version is inserted after a 
>> newer version.
>> 
>> I believe that due to the number of client requests, we are effectively 
>> seeing threads pre-empt other threads within this query and because no 
>> explicit locking has yet been added to the system, we have problems.
>> 
>> How can I make the steps (1) through (5) thread-safe?
>> 
>> I have tried adding xdmp:transaction-mode "update"; to my REST query, and 
>> using an explicit xdmp:commit at the end. This has not helped at all, but I 
>> think that is because we are never writing the same document, every document 
>> we write in steps (4) and (5) will always have a different URI in the 
>> database. I think really that we need to be able to lock based on an 
>> abstract uri (e.g. the content of our id element) and not the document uri 
>> as that varies over time in our model.
>> 
>> I also looked at xdmp:lock-acquire, but it appears the locks are shared for 
>> a single user, i.e. it states - "When a user locks a URI, it is locked to 
>> other users, but not to the user who locked it", the problem I have here is 
>> that this is a public un-authenticated REST end-point effectively so it will 
>> always be the same user running the query as far as ML is concerned.
>> 
>> Does anyone have any suggestions of how we might achieve what we are looking 
>> for?
>> 
>> Cheers Adam.
>> 
>> DISCLAIMER
>> This message is intended only for the use of the person(s) ("Intended 
>> Recipient") to whom it is addressed. It may contain information, which is 
>> privileged and confidential. Accordingly any dissemination, distribution, 
>> copying or other use of this message or any of its content by any person 
>> other than the Intended Recipient may constitute a breach of civil or 
>> criminal law and is strictly prohibited. If you are not the Intended 
>> Recipient, please contact the sender as soon as possible.
>> Reed Business Information Limited. Registered Office: Quadrant House, The 
>> Quadrant, Sutton, Surrey, SM2 5AS, UK. 
>> Registered in England under Company No. 151537
>> 
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>> 
> 
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> 

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to