I believe the request retry time applies at the cluster level when a 
deadlock is detected between two requests.  One of the solutions MarkLogic uses 
to resolve deadlocks is to pick one of the requests and summarily kill it.  
That causes any updates it's performed to be rolled back.  The killed request 
is then re-scheduled.

   In a busy cluster, with lots of updates happening, the incidence of 
deadlocks goes up which results in more deadlock breaking.  Which means more 
wasted work because the killed requests must start over.  Managing locks in 
clusters also require communication between the nodes, which is not free.

   It's important to understand that an "update" does not actually need to 
perform any updates.  Requests are classified as a query or an update before 
they start by statically analyzing their code call graph.  If it's possible to 
call any update functions, then the request runs as an update.

   Updates acquire locks for the documents they touch.  Read locks for any 
document accessed, write locks for any documents modified.  Queries always run 
lock-free, never acquiring any locks at all, because they run as-of a point in 
time.

   So if a commonly run request usually doesn't usually make any updates, but 
might, a fairly large locking overhead is incurred and the more of those 
requests that run concurrently the more likely there will be deadlocks that 
must be broken, resulting in more retries.  Read-only queries do not inhibit 
each other because there is no lock contention.  They scale much more better, 
constrained only by available computing resources.

   Insuring that requests are running as queries will avoid deadlocks.  A 
common solution to "query-mostly" requests that might occasionally need to do 
an update is to move the update code to another module and then xdmp:invoke 
that code from the mainline query (as a separate transaction).  One way to 
verify that your code really is running as a lock-free query is to add this to 
the prologue:

declare option xdmp:update "false";

   If any update functions are in scope an exception will be thrown before the 
request can start.

   So, to summarize, it is possible that the request retry may be coming into 
play if many requests are deadlocking on the same document(s).  Check the logs 
(at debug level) to see if there are a lot of auto-retried deadlocks.  If the 
contending requests can be made into read-only queries then lock contention 
will not be a factor.  If occasional updates are required, they can be 
accommodated by isolating them into separate transactions.

   I hope that's helpful.

---
Ron Hitchens {[email protected]}  +44 7879 358212

On Feb 16, 2015, at 5:01 PM, "Muth, John, Springer UK" <[email protected]> 
wrote:

> Hello all,
> 
> Recently while investigating poor performance under load we've been confused 
> by the presence of some requests/threads/queries staying around longer than 
> we thought should be possible.
> 
> We have the following app server config settings:
> 'default time limit' = 5
> 'max time limit' = 60
> 
> But we were often seeing the oldest requests of 185 seconds.
> 
> Then we noticed in our 'Groups/Default' configuration there is a setting 
> 'retry timeout' set to what I assume is the default of 180. 
> 
> I'm guessing 5 + 180 = this 'retry timeout' setting is somehow playing a role 
> in those long-running requests.
> 
> What is the purpose of the 'retry timeout' setting? 
> What kinds of problems would cause requests to be retried?
> Should we expect the group-level 'retry timeout' setting to have precedence 
> over the app-server-specific settings time limit settings? 
> 
> Thanks for any advice,
> John
> _______________________________________________
> General mailing list
> [email protected]
> http://developer.marklogic.com/mailman/listinfo/general

_______________________________________________
General mailing list
[email protected]
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to