Adam,
Do those REST endpoints enforce the ETag protocol? The ones I designed
required ETags on update for just this reason.
If ETags are not being enforced, doing xdmp:lock-for-update on the id should
give you the same benefit Mike describes of using the id as the document URI.
Acquiring that lock first (call it Step 1.5 below) before looking at the
current state of the existing documents (Steps 2-5) should serialize the update
path and prevent multiple threads seeing the same "before" state until any
currently updating thread commits. Threads waiting on that lock will see the
result of the update they were waiting for.
BTW, it's not necessary to explicitly call xdmp:set-transaction-mode or
xdmp:commit in most cases. An XQuery statement begins as an update if it can
potentially call any update methods (this is determined at static analysis time
by examining the expression spanning tree). Any updates performed during the
run are automatically committed upon successful completion. If anything goes
wrong, all updates are discarded and locks released.
Here's a test. Run this:
xdmp:document-insert ("/test/foo.xml", <root><version>1</version></root>)
Put this module in an appserver root:
xquery version "1.0-ml";
declare variable $uri := "/test/foo.xml";
declare function local:update()
{
let $version := xs:integer (fn:doc ($uri)/root/version)
let $new-version := $version + 1
let $_ := xdmp:log ("current version: " || $version || ", new version:
" || $new-version)
return xdmp:document-insert ($uri, <root><version>{ $new-version
}</version></root>)
};
xdmp:sleep (fn:ceiling (xdmp:random() mod 10000)),
xdmp:lock-for-update ("thisisthelockkey"),
local:update()
Run this in QC for that appserver and watch the log output (adjust module
path as needed).
xquery version "1.0-ml";
xdmp:spawn ("/xquery/foo.xqy"),
xdmp:spawn ("/xquery/foo.xqy"),
xdmp:spawn ("/xquery/foo.xqy"),
xdmp:spawn ("/xquery/foo.xqy")
=>
2014-07-02 12:37:44.384 Info: TaskServer: current version: 6, new version: 7
2014-07-02 12:37:51.285 Info: TaskServer: current version: 7, new version: 8
2014-07-02 12:37:52.890 Info: TaskServer: current version: 8, new version: 9
2014-07-02 12:37:54.039 Info: TaskServer: current version: 9, new version: 10
---
Ron Hitchens {[email protected]} +44 7879 358212
On Jul 1, 2014, at 3:53 PM, "Retter, Adam (RBI-UK)" <[email protected]>
wrote:
> Hi Michael,
>
> Thanks for your reply. I guess I am still missing something as it is not
> clear to me how encoding both the id and version into the file URI would help
> me? I could understand if I was just encoding the id as that will not change
> over time, however for each request we are potentially writing a different
> version.
>
> Example 1
> ========
> For example, if I follow your suggestion of encoding the id and version into
> the document URI. Let us assume that a document already exists in the
> database with id=1234 and version=1, therefore the URI is /1234-v1.xml:
>
> XQuery Thread
> -------------------
> 0) Set transaction mode to 'updating'
> 1) XQuery REST Endpoint - receives an XML document over HTTP POST which is
> id=1234
> 2) Searches the database for an existing document, which contains an id
> element with value 1234, and a version element with value 2. It finds the
> document /1234-v1.xml. (I assume this causes the query transaction to take a
> READ lock on the document URI /1234-v1.xml?)
> 3) It compares the version of the two documents. The posted document has a
> new version so it continues.
> 4) Removes the collection 'live' from the document /1234-v1.xml. (Does this
> take a WRITE lock on /1234-v1.xml?)
> 5) Inserts the posted content into the document /1234-v2.xml into the
> database and add's it to the 'live' collection. (Does this take a WRITE lock
> on /1234-v2.xml?)
> 6) Call xdmp:commit (Presumably all READ and WRITE locks are released here?)
>
> If I have more than one of these threads executing in parallel, it seems to
> me that through thread pre-emption it is still possible for more than one
> thread to get to at least complete to the end of (3) before any sort of lock
> contention occurs. Imagining there are just two threads in parallel for the
> moment, I think that means that when the first thread to acquire the lock
> releases the lock in (6), then the other thread will continue through (4) -
> (6), is that correct? If so that leads to a different class of errors: a) if
> both posted documents that initiated the threads both have version=2, then
> yes I cannot generate a duplicate in the database, as the second thread to
> complete with overwrite the v2 document of the first thread, but which was
> meant to be the correct v2? b) If both posted documents have different
> versions but greater than version=1, then I may end up with both version=2
> and version=3 documents in the live collection.
>
> If I understand you correctly and my assumptions above are correct, then to
> a) prevent inserting the same version and id, and to b) also prevent
> inserting the same id and different versions, we would need to re-design our
> document URI scheme to *just* include the id of the document and *not* the
> version. Is that correct?
>
> As you suggested I was considering using xdmp:lock-for-update. Introducing
> this between steps (1) and steps(2) of the above and taking the lock on the
> id of our record (i.e. ignoring the version) does indeed seem to fix our
> issues. Thank you very much for your guidance Mike. If you have any comments
> or clarifications on what I have written and my assumptions, I would be glad
> to hear from you further...
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Michael Blakeley
> Sent: 30 June 2014 18:47
> To: MarkLogic Developer Discussion
> Subject: Re: [MarkLogic Dev General] Locking and Transactions in REST
> read+update
>
> MarkLogic automatically locks document URIs as necessary. The goal is to
> design your document URIs to enforce whatever constraints you need.
>
> The best way to avoid a conflict is to build the version into the document
> URI, as well as having it in the XML. If your URI is something like
> /{$id}/{$version} then concurrent attempts to insert the same id and version
> will try to lock the same URI. One of them will win, and the other will
> retry. This also means step #2 in your process is as simple as
> exists(doc($uri)) - but not xdmp:exists, because that function won't
> read-lock the URI.
>
> If for some reason you can't build the id and version into your URIs, fake it
> with an intent lock. Use whatever real URI you like, but in the same insert
> code construct a fake URI with the id and version, and call
> https://docs.marklogic.com/xdmp:lock-for-update to lock that fake URI
> explicitly. Again any concurrent requests will have to resolve the conflict,
> and one will win. You'll still have to check for existing versions in step
> #2, but at least you'll have a write lock on the id and version.
>
> Note that conflict resolution can be bad for performance. It's best to design
> your ingestion process such that conflicts will be rare. Having that step #2
> helps, but this is another reason to prefer a real id-version URI over an
> intent lock.
>
> -- Mike
>
> On 30 Jun 2014, at 09:33 , Retter, Adam (RBI-UK) <[email protected]>
> wrote:
>
>> We have what I consider to be an interesting issue with an XQuery that is
>> run as a REST endpoint, basically we have at least two race-conditions that
>> we have identified. Typically I would fix these by enforcing something like
>> a Critical Section in the code through appropriate locking.
>> Unfortunately after lots of head scratching and re-reading of documentation
>> I cannot at the moment see how to solve this with the facilities provided in
>> MarkLogic and am looking for some guidance. I guess this is a common issue
>> that others must have solved before, so I am most likely missing something
>> obvious!
>>
>> Our REST endpoint effectively does the following:
>>
>> 1) XQuery REST Endpoint - receives an XML document over HTTP POST. Let's
>> call this document B.
>> 2) Searches the database for an existing document, which has an <id> element
>> with the same value as that in document B. Assuming we find a document, let
>> us call that document A.
>> 3) Check the version of document B against document A. The version is
>> indicated in a <version> element in each document respectively. The version
>> of document B should be newer than document A, if not then stop, else
>> continue.
>> 4) Remove document A from the 'live' collection
>> 5) Insert document B into the database and add it to the 'live' collection.
>>
>> Now this REST end-point may be called by many clients in parallel, which
>> means not just adding the new document B, but in parallel running the above
>> query for document C, D, E ... nN. I think we are seeing three separate race
>> conditions appearing:
>>
>> i) Steps (4) and (5) where the same version of the document with the same id
>> can be inserted into the live collection. Typically step (4) tries to ensure
>> there is only one live version by removing the old document (document A)
>> from the live collection, before adding the new document (document B) to the
>> live collection.
>>
>> ii) Steps (3) and (5) where multiple versions can be inserted into the live
>> collection.
>>
>> iii) Steps (3) and (5) where sometimes an older version is inserted after a
>> newer version.
>>
>> I believe that due to the number of client requests, we are effectively
>> seeing threads pre-empt other threads within this query and because no
>> explicit locking has yet been added to the system, we have problems.
>>
>> How can I make the steps (1) through (5) thread-safe?
>>
>> I have tried adding xdmp:transaction-mode "update"; to my REST query, and
>> using an explicit xdmp:commit at the end. This has not helped at all, but I
>> think that is because we are never writing the same document, every document
>> we write in steps (4) and (5) will always have a different URI in the
>> database. I think really that we need to be able to lock based on an
>> abstract uri (e.g. the content of our id element) and not the document uri
>> as that varies over time in our model.
>>
>> I also looked at xdmp:lock-acquire, but it appears the locks are shared for
>> a single user, i.e. it states - "When a user locks a URI, it is locked to
>> other users, but not to the user who locked it", the problem I have here is
>> that this is a public un-authenticated REST end-point effectively so it will
>> always be the same user running the query as far as ML is concerned.
>>
>> Does anyone have any suggestions of how we might achieve what we are looking
>> for?
>>
>> Cheers Adam.
>>
>> DISCLAIMER
>> This message is intended only for the use of the person(s) ("Intended
>> Recipient") to whom it is addressed. It may contain information, which is
>> privileged and confidential. Accordingly any dissemination, distribution,
>> copying or other use of this message or any of its content by any person
>> other than the Intended Recipient may constitute a breach of civil or
>> criminal law and is strictly prohibited. If you are not the Intended
>> Recipient, please contact the sender as soon as possible.
>> Reed Business Information Limited. Registered Office: Quadrant House, The
>> Quadrant, Sutton, Surrey, SM2 5AS, UK.
>> Registered in England under Company No. 151537
>>
>> _______________________________________________
>> General mailing list
>> [email protected]
>> http://developer.marklogic.com/mailman/listinfo/general
>>
>
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general
_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general