[ https://issues.apache.org/jira/browse/COCOON-1985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12578347#action_12578347 ]
Alexander Daniel commented on COCOON-1985: ------------------------------------------ Two requests can deadlock each other in Cocoon 2.1.11 (without use of parallel with include transformer): * request A: generating lock for 55933 * request B: generating lock for 58840 * request B: waiting for lock 55933 which is hold by request A * request A: waiting for lock 58840 which is hold by request B I can reproduce this behaviour with Apache Bench and following pipeline: * terminal 1: Apache Bench request A (ab -k -n 10000 -c 25 http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/55933/) * terminal 2: Apache Bench request B (ab -k -n 10000 -c 25 http://localhost:8888/samples/reproduceMultipleThreads/productOfferForDevice/58840/) * terminal 3: touching the two data files every second to invalidate the cache (while true; do echo -n "."; touch 55933.xml 58840.xml; sleep 1; done) * pipeline: <map:pipeline type="caching"> <map:match pattern="productOfferForDevice*/*/"> <map:generate src="cocoon:/exists/{2}.xml" label="a"/> <map:transform type="xsltc" src="productOfferIncludeDevice.xsl" label="b"> <map:parameter name="noInc" value="{1}"/> </map:transform> <map:transform type="include" label="c"/> <map:serialize type="xml"/> </map:match> <map:match pattern="exists/**"> <map:act type="resource-exists"> <map:parameter name="url" value="{1}" /> <map:generate src="{../1}" /> <map:serialize type="xml" /> </map:act> <!-- not found --> <map:generate src="dummy.xml" /> <map:serialize type="xml" /> </map:match> </map:pipeline> After some seconds the deadlock occurs ==> * Apache Bench requests run into a timeout * I can see following pipe locks in the default transient store: PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 (class: org.mortbay.util.ThreadPool$PoolThread) PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 (class: org.mortbay.util.ThreadPool$PoolThread) PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml (class: org.mortbay.util.ThreadPool$PoolThread) PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml (class: org.mortbay.util.ThreadPool$PoolThread) I added some logging to AbstractCachingProcessingPipeline.java which reconfirms the explanations above: INFO (2008-03-13) 13:50.16:072 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/55933.xml?pipelinehash=-910770960103935149_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 INFO (2008-03-13) 13:50.16:074 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml INFO (2008-03-13) 13:50.16:075 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-cocoon://samples/reproduceMultipleThreads/exists/58840.xml?pipelinehash=-4996088883111986478_T-xsltc-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/productOfferIncludeDevice.xsl;noInc=_T-include-I_S-xml-1 INFO (2008-03-13) 13:50.16:075 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: generating lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml INFO (2008-03-13) 13:50.16:281 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/58840/) PoolThread-6/AbstractCachingProcessingPipeline: waiting for lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/55933.xml INFO (2008-03-13) 13:50.16:304 [sitemap] (/samples/reproduceMultipleThreads/productOfferForDevice/55933/) PoolThread-47/AbstractCachingProcessingPipeline: waiting for lock PIPELOCK:PK_G-file-file:///Users/alex/dev/cocoon/cocoon-2.1.11/build/webapp/samples/reproduceMultipleThreads/58840.xml With the attached reproduceMultipleThreads.tar.gz you can reproduce the behaviour yourself: * download and extract Cocoon 2.1.11 * cd $CocoonHome * ./build.sh * cd build/webapp/samples * tar -xzf $DownloadFolder/reproduceMultipleThreads.tar.gz * cd ../../.. * ./cocoon.sh * open 3 terminals and cd into $CocoonHome/build/webapp/samples/reproduceMultipleThreads in each * dry run without invalidating the cache to see that everything is working: - terminal 1: ./terminal1.sh - terminal 2: ./terminal2.sh * run with invalidating the cache every seconds: - terminal 1: ./terminal1.sh - terminal 2: ./terminal2.sh - terminal 3: ./terminal3.sh * When Apache Bench has run into a timeout you can view the pipelocks with http://localhost:8888/samples/reproduceMultipleThreads/pipelocks We are currently facing this issue on our production servers. WHAT IS THE BEST WAY TO FIX THAT? * removing the pipeline locking code as Ellis suggested? * making waitForLock fuzzy * ... If the pipelock design is the same in Cocoon 2.2 the same deadlock could occur there. > AbstractCachingProcessingPipeline locking with IncludeTransformer may hang > pipeline > ----------------------------------------------------------------------------------- > > Key: COCOON-1985 > URL: https://issues.apache.org/jira/browse/COCOON-1985 > Project: Cocoon > Issue Type: Bug > Components: * Cocoon Core > Affects Versions: 2.1.9, 2.1.10, 2.1.11, 2.2-dev (Current SVN) > Reporter: Ellis Pritchard > Priority: Critical > Fix For: 2.2-dev (Current SVN) > > Attachments: caching-trials.patch, includer.xsl, patch.txt, > sitemap.xmap > > > Cocoon 2.1.9 introduced the concept of a lock in > AbstractCachingProcessingPipeline, an optimization to prevent two concurrent > requests from generating the same cached content. The first request adds the > pipeline key to the transient cache to 'lock' the cache entry for that > pipeline, subsequent concurrent requests wait for the first request to cache > the content (by Object.lock()ing the pipeline key entry) before proceeding, > and can then use the newly cached content. > However, this has introduced an incompatibility with the IncludeTransformer: > if the inclusions access the same yet-to-be-cached content as the root > pipeline, the whole assembly hangs, since a lock will be made on a lock > already held by the same thread, and which cannot be satisfied. > e.g. > i) Root pipeline generates using sub-pipeline cocoon:/foo.xml > ii) the cocoon:/foo.xml sub-pipeline adds it's pipeline key to the transient > store as a lock. > iii) subsequently in the root pipeline, the IncludeTransformer is run. > iv) one of the inclusions also generates with cocoon:/foo.xml, this > sub-pipeline locks in AbstractProcessingPipeline.waitForLock() because the > sub-pipeline key is already present. > v) deadlock. > I've found a (partial, see below) solution for this: instead of a plain > Object being added to the transient store as the lock object, the > Thread.currentThread() is added; when waitForLock() is called, if the lock > object exists, it checks that it is not the same thread before attempting to > lock it; if it is the same thread, then waitForLock() returns success, which > allows generation to proceed. You loose the efficiency of generating the > cache only once in this case, but at least it doesn't hang! With JDK1.5 this > can be made neater by using Thread#holdsLock() instead of adding the thread > object itself to the transient store. > See patch file. > However, even with this fix, parallel includes (when enabled) may still hang, > because they pass the not-the-same-thread test, but fail because the root > pipeline, which holds the initial lock, cannot complete (and therefore > statisfy the lock condition for the parallel threads), before the threads > themselves have completed, which then results in a deadlock again. > The complete solution is probably to avoid locking if the lock is held by the > same top-level Request, but that requires more knowledge of Cocoon's > processing than I (currently) have! > IMHO unless a complete solution is found to this, then this optimization > should be removed completely, or else made optional by configuration, since > it renders the IncludeTransformer dangerous. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.