Sylvain Wallez wrote:

Hi all,

I'm currently working on a publication application with complex database queries where we want to prefetch some of the pages linked to by the page currently being produced, in order to speed up response time on pages that are likely to be asked for by users.

To achieve this, we have a "PrefetchTransformer" that grabs elements having a prefetch="true" attribute and starts a background job to load the corresponding "src" or "href" URL using a "cocoon:".

At first I used JobScheduler.fireJob() to schedule for immediate execution, but went into *weird* bugs with strange NPEs all around in pipeline components. After analysis, it appeared that while the scheduler thread was processing the pipeline, the http thread was recycling the *background environment*, thus nulling the object model and other class attributes used by pipeline components.

I spent the *whole day* trying to find the cause for this, without success (how frustrating).

Then I decided to try another approach and use JobScheduler.fireJobAt(new Date()), meaning "schedule the job for later execution... now!". And it worked!

Weird, weird, weird! Anybody having a hint about why fireJob() is doing this environment mixture?


Actually fireJobAt() is broken also when using another test case. Desperately searching for the cause, I went back to basics, i.e. "new Thread(runnable).start()". Also broken, but helped me to finally find the cause :-)

The problems lies in CocoonComponentManager.addForAutomaticRelease().

The environmentStack is a CloningInheritableThreadLocal. That means that when we create a new thread, it inherits the environment stack of its parent thread.

The result is that threads created by Cocoon *always* inherit an environment stack of at least size 1: - in the cron block, that's the environment of the first http request, which created the Cocoon object - for "new Thread()", that's the same as above, plus all sitemaps that we've been through when we create the thread.

Now let's look at InvokeContext.getProcessingPipeline() (in treeprocessor): if this is an internal request, the pipeline object is added for automatic release. I guess this is to avoid memory leaks if ever we forget to call resolver.release() on a sitemap source.

Following this path, let's go to CocoonComponentManager.addForAutomaticRelease(). The component that has to be autoreleased is added to a list attached to the *first* environment of the stack (because of "stack.get(0)"), and is therefore released when we exit this environment.

Now what happens when we create a thread that runs in the background? The end of processing of the *http* request releases pipeline objects of all child threads of the servlet engine's thread (the one which processes the http request). If the background thread uses a "cocoon:" URL and is currently executing the corresponding pipeline, recycle() is called on all pipeline components and bang, NPEs all around the place!!

And this leads to very random bugs: since servlet engines uses a thread pools, this erroneous pipeline release happens only when the servlet engine reuses the thread that intially loaded CocoonServlet. And NPEs happen when this first thread is used *and* a scheduler thread is executing a "cocoon:" pipeline. Weird...

So the question is:
- why does the environment stack have to be inherited by child threads? Is it to keep the current context? Then isn't inheriting the current processor and uri context enough?
- why is the pipeline automatically released? Is to avoid memory leaks?

Possible remedies would be to remove one of the above features, but I guest they're there for a reason.

So what about adding CocoonComponentManager.clearEnvironmentStack(), that could be called by CocoonQuartzJobExecutor, or even DefaultThreadFactory (in o.a.c.c.thread) before running the job?

Thoughts?

Sylvain

--
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director

Reply via email to