Hi team,

Recently Vadim started to scratch the VPC itch (for those who wonder, VPC = "virtual sitemap component"), and he pinged me a few times on ICQ to discuss their implementation. And I'm happy to say that we found what I consider a simple and elegant solution to many problems, even ones we have today.

Enough teasing, let's explain it all now. Warning you may get lost if you don't know how the pipeline and sitemap engine works :-)

Implementing VPCs means that we have to wrap a ProcessingNode (the execution unit of the sitemap interpreter) in an implementation of Generator, Transformer, Serializer or Reader. The problem is that these objects have very different lifecycles:
- a VPC's ProcessingNode is attached to a Processor instance (a sitemap) and is known at sitemap build time.
- a sitemap component (G, T, S or R) is poolable and therefore created on demand when used.


This "on demand" creation means it can occur e.g. in a subsitemap, i.e. in an environment context (used to resolve relative URIs) that is different from the one where it was actually declared. We see some effects of this with the I18NTransformer who sometimes fails to load message catalogues if its first use is in a child sitemap of the one where it was declared. We'll come back to this later.

                          --- oOo ---

So what object or data does a component have acces to data that is directly related to the actual sitemap where it was declared? Such an object is the ServiceManager (SM), as each sitemap defines a new SM to hold its components. However, extending the SM interface to provide access to additional data is a bad idea as it ties components to a particular extension of the Avalon framework contract. Now if we look at the Avalon lifecycle, we see the Contextualizable interface, where the SM passes to the components it manages the Avalon Context it itself was contextualized with.

Currently, there's only one Context object throughout the whole Cocoon system, created by the top-level environment (servler or CLI) and holding either webapp-wide data (e.g. the work directory, the servlet context, etc) or request-specific data (object model, etc). We can change this so that each sitemap defines its own Avalon Context that will be passed to every Contextualizable component managed by its local SM.

That per-sitemap Context can then hold any information known at sitemap-build time that could be needed by VPCs, whatever environment they are created in (subsitemap, remote blocks, etc).

So the TreeBuilder can put in the context a Map for each kind of sitemap component that associates VPC names to the corresponding ProcessingNode. The VPC sitemap component implementation can then, _whatever the environment it is created in_, get its associated processing node and invoke it to build a partial pipeline that will behave like a "regular" component.

"Partial" pipeline means that we will need some special implementations of the Pipeline interface that accept incomplete pipelines. Vadim already started working on this and for example, the pipeline for a virtual serializer won't accept a generator, but will accept zero or more transformer and will require a serializer.

Now that we know how to implement VPCs as regular components, on to source resolving...

                         --- oOo ---

The problem with source resolving is that the base URI used to resolve relative URIs changes when we enter a subsitemap: relative sources are relative to the directory containing the "current" sitemap.

That means that the base URI used to resolve e.g. the "src" attribute of a <map:generate> is the one of the sitemap containing that statement, and not the sitemap where the component was declared, which can be a parent sitemap of the current one.

This isn't a problem with URIs part of a statement ("src" attribute and <map:parameter>) but is a real problem for URIs part of the component configuration. That's what happens with the I18nTransformer as catalogue locations are URIs defined in the component declaration, thus relative to the sitemap where the component is _declared_. Unfortunately, they are resolved relatively to where the component is first _instanciated_, which can occur randomly in any of the current sitemap and its child sitemaps, depending on how pools are managed. The practical result is that we cannot reliably declare an i18n transformer for use by a tree of subsitemaps.

Now that we have a per-sitemap Avalon Context, we can also store in that context the base URI of the sitemap declaring the component. The i18n transformer just has to use that base URI to access the catalogues defined in its configuration.

That's what I called "multi-relative" source resolving in the subject of this post: URIs coming from a component configurations will have to be resolved relatively to the base URI contained in the Avalon context, whereas URIs coming from sitemap statements are resolved using the relative URI of the sitemap that is currently executing.

Still following? Now let's see source resolving in VPCs...

                         --- oOo ---

With VPCs, the problem is worse than with regular components, as VPCs are components defined by sitemap snippets with their "src" and <map:parameter>. So what does "relative" means in this context? Is it relative to the calling sitemap or relative to the sitemap that defines the VPC? The result is "it depends"!

It depends on whether the URI is passed from the calling environment (it's then relative to the calling sitemap) or is some local data used by the VPC implementation such as an XSLT (it's then relative to the sitemap defining the VPC).

So how do we distinguish them? A solution was proposed [1] where we added some typing information to the sitemap statements calling the VPC, so that URIs could be absolutized before the actual call.

That is actually wrong, as it forces the user of a component to explicitely indicate that some particular action should be taken on a parameter, whereas this information is related to the implementation of the component. Furthermore, forgetting to specify that absolutization has to be performed can lead to weird behaviours difficult to debug.

So, it's the VPC's responsibility to make explicit in its definition what values coming from the caller have to be absolutized relatively to the calling sitemap.

For this, I propose that VPC definitions have additional statements defining what parameters have to be absolutized, e.g.:

<map:generator name="foo">
 <map:absolutize param="src"/>
 <map:absolutize param="bar"/>

 <map:generate type="file" src="{src}">
   <map:parameter name="baz" value="bar"/>
 </map:parameter>
 <map:transform src="data/{skin}.xslt/>
</map:generator>

The input parameters "src" (actually the "src" attribute in the calling statement) and "bar" are first absolutized relatively to the calling sitemap, and then the base URI of the sitemap defining the VPC becomes the new relative context, used e.g. to resolve "data/{skin}.xslt".

That way, we can also implement multi-relative source resolving in sitemap statements.

We may actually want to go a bit further by allowing any computation to provide input parameters using input modules, e.g.
<map:generator name="foo">
<map:parameter name="src" value="{absolutize:{src}}"/>
...


But the source-resolving problem is not finished...

                         --- oOo ---

The last source-resolving problem is related to URIs that may be present in the SAX stream, e.g. XInclude URIs. What are they relative to?

My feeling here is that we need to distinguish for a single VPC the base URI used to resolve URIs within the setup phase (i.e. "src" and <map:parameter>) and the base URI used to resolve URIs during the processing phase.

That could be achieved using an additional attribute on the component declaration, i.e. in the above example something like

<map:generator name="foo" stream-uris-base="local|caller">

Now we should have considered every source-resolving problem :-)

                         --- oOo ---

Ok, thanks for reading so far.

As a conclusion, the main change in the current architecture that leads to solving a great number of problems is that we will now have a per-sitemap Avalon Context rather than a single webapp-wide one.

That context will contain:
- ProcessingNodes to be wrapped as regular components,
- the base URI of the associated sitemap,
and will of inherit all other entries from its parent context.

Once we have that, many things will follow and although there are still some details to be sorted out such as in-stream URIs, I think we now have an answer to most if not all the nasty questions that were somehow blocking the implementation of VPCs.

And as VPCs are an important part of the real blocks puzzle, the next step will be to integrate all this with the new kernel.

Thanks a lot to Vadim for starting the work on VPCs and triggering all these thoughts.

Thoughts, comments?


Sylvain

[1] http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=109829205826400&w=2

--
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }



Reply via email to