[ 
https://issues.apache.org/jira/browse/COCOON-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578347#action_12578347
 ] 

Alexander Daniel commented on COCOON-1985:
------------------------------------------

Two requests can deadlock each other in Cocoon 2.1.11 (without use of parallel 
with include transformer):
* request A: generating lock for 55933
* request B: generating lock for 58840
* request B: waiting for lock 55933 which is hold by request A
* request A: waiting for lock 58840 which is hold by request B


I can reproduce this behaviour with Apache Bench and following pipeline:
* terminal 1: Apache Bench request A (ab -k -n 10000 -c 25 
http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/55933/)
* terminal 2: Apache Bench request B (ab -k -n 10000 -c 25 
http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/58840/)
* terminal 3: touching the two data files every second to invalidate the cache 
(while true; do echo -n "."; touch 55933.xml 58840.xml; sleep 1; done)

* pipeline:
<map:pipeline type="caching">
  <map:match pattern="productOfferForDevice*/*/">
    <map:generate src="cocoon:/exists/{2}.xml" label="a"/>
    <map:transform type="xsltc" src="productOfferIncludeDevice.xsl" label="b">
        <map:parameter name="noInc" value="{1}"/>
    </map:transform>
    <map:transform type="include" label="c"/>
    <map:serialize type="xml"/>
</map:match>

  <map:match pattern="exists/**">
    <map:act type="resource-exists">
        <map:parameter name="url" value="{1}" />
        <map:generate src="{../1}" />
        <map:serialize type="xml" />
    </map:act>
    <!-- not found -->
    <map:generate src="dummy.xml" />
    <map:serialize type="xml" />
  </map:match>
</map:pipeline>


After some seconds the deadlock occurs ==>
* Apache Bench requests run into a timeout

* I can see following pipe locks in the default transient store:
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
 (class: org.mortbay.util.ThreadPool$PoolThread)
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
 (class: org.mortbay.util.ThreadPool$PoolThread)
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
 (class: org.mortbay.util.ThreadPool$PoolThread)
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml
 (class: org.mortbay.util.ThreadPool$PoolThread)


I added some logging to AbstractCachingProcessingPipeline.java which reconfirms 
the explanations above:
INFO  (2008-03-13) 13:50.16:072 [sitemap] 
(/samples/reproduceMultipleThreads/productOfferForDevice/55933/) 
PoolThread-47/AbstractCachingProcessingPipeline: generating lock 
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
INFO  (2008-03-13) 13:50.16:074 [sitemap] 
(/samples/reproduceMultipleThreads/productOfferForDevice/55933/) 
PoolThread-47/AbstractCachingProcessingPipeline: generating lock 
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
INFO  (2008-03-13) 13:50.16:075 [sitemap] 
(/samples/reproduceMultipleThreads/productOfferForDevice/58840/) 
PoolThread-6/AbstractCachingProcessingPipeline: generating lock 
PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1
INFO  (2008-03-13) 13:50.16:075 [sitemap] 
(/samples/reproduceMultipleThreads/productOfferForDevice/58840/) 
PoolThread-6/AbstractCachingProcessingPipeline: generating lock 
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml
INFO  (2008-03-13) 13:50.16:281 [sitemap] 
(/samples/reproduceMultipleThreads/productOfferForDevice/58840/) 
PoolThread-6/AbstractCachingProcessingPipeline: waiting for lock 
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml
INFO  (2008-03-13) 13:50.16:304 [sitemap] 
(/samples/reproduceMultipleThreads/productOfferForDevice/55933/) 
PoolThread-47/AbstractCachingProcessingPipeline: waiting for lock 
PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml


With the attached reproduceMultipleThreads.tar.gz you can reproduce the 
behaviour yourself:
* download and extract Cocoon 2.1.11
* cd $CocoonHome
* ./build.sh
* cd build/webapp/samples
* tar -xzf $DownloadFolder/reproduceMultipleThreads.tar.gz
* cd ../../..
* ./cocoon.sh
* open 3 terminals and cd into 
$CocoonHome/build/webapp/samples/reproduceMultipleThreads in each

* dry run without invalidating the cache to see that everything is working:
  - terminal 1: ./terminal1.sh 
  - terminal 2: ./terminal2.sh

* run with invalidating the cache every seconds:
  - terminal 1: ./terminal1.sh
  - terminal 2: ./terminal2.sh
  - terminal 3: ./terminal3.sh
* When Apache Bench has run into a timeout you can view the pipelocks with 
http://localhost:8888/samples/reproduceMultipleThreads/pipelocks


We are currently facing this issue on our production servers.

WHAT IS THE BEST WAY TO FIX THAT?
* removing the pipeline locking code as Ellis suggested?
* making waitForLock fuzzy
* ...

If the pipelock design is the same in Cocoon 2.2 the same deadlock could occur 
there.

> AbstractCachingProcessingPipeline locking with IncludeTransformer may hang 
> pipeline
> -----------------------------------------------------------------------------------
>
>                 Key: COCOON-1985
>                 URL: https://issues.apache.org/jira/browse/COCOON-1985
>             Project: Cocoon
>          Issue Type: Bug
>          Components: * Cocoon Core
>    Affects Versions: 2.1.9, 2.1.10, 2.1.11, 2.2-dev (Current SVN)
>            Reporter: Ellis Pritchard
>            Priority: Critical
>             Fix For: 2.2-dev (Current SVN)
>
>         Attachments: caching-trials.patch, includer.xsl, patch.txt, 
> sitemap.xmap
>
>
> Cocoon 2.1.9 introduced the concept of a lock in 
> AbstractCachingProcessingPipeline, an optimization to prevent two concurrent 
> requests from generating the same cached content. The first request adds the 
> pipeline key to the transient cache to 'lock' the cache entry for that 
> pipeline, subsequent concurrent requests wait for the first request to cache 
> the content (by Object.lock()ing the pipeline key entry) before proceeding, 
> and can then use the newly cached content.
> However, this has introduced an incompatibility with the IncludeTransformer: 
> if the inclusions access the same yet-to-be-cached content as the root 
> pipeline, the whole assembly hangs, since a lock will be made on a lock 
> already held by the same thread, and which cannot be satisfied.
> e.g.
> i) Root pipeline generates using sub-pipeline cocoon:/foo.xml
> ii) the cocoon:/foo.xml sub-pipeline adds it's pipeline key to the transient 
> store as a lock.
> iii) subsequently in the root pipeline, the IncludeTransformer is run.
> iv) one of the inclusions also generates with cocoon:/foo.xml, this 
> sub-pipeline locks in AbstractProcessingPipeline.waitForLock() because the 
> sub-pipeline key is already present.
> v) deadlock.
> I've found a (partial, see below) solution for this: instead of a plain 
> Object being added to the transient store as the lock object, the 
> Thread.currentThread() is added; when waitForLock() is called, if the lock 
> object exists, it checks that it is not the same thread before attempting to 
> lock it; if it is the same thread, then waitForLock() returns success, which 
> allows generation to proceed. You loose the efficiency of generating the 
> cache only once in this case, but at least it doesn't hang! With JDK1.5 this 
> can be made neater by using Thread#holdsLock() instead of adding the thread 
> object itself to the transient store.
> See patch file.
> However, even with this fix, parallel includes (when enabled) may still hang, 
> because they pass the not-the-same-thread test, but fail because the root 
> pipeline, which holds the initial lock, cannot complete (and therefore 
> statisfy the lock condition for the parallel threads), before the threads 
> themselves have completed, which then results in a deadlock again.
> The complete solution is probably to avoid locking if the lock is held by the 
> same top-level Request, but that requires more knowledge of Cocoon's 
> processing than I (currently) have!
> IMHO unless a complete solution is found to this, then this optimization 
> should be removed completely, or else made optional by configuration, since 
> it renders the IncludeTransformer dangerous.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to