On Sun, Jun 29, 2003 at 09:08:14PM +1200, Conal Tuohy wrote:
> Jeff Turner wrote:
>
> > > That's an issue I've come up against too - it seems that views are
> > > still too "tangled" up with labels and can't cut across pipelines
> > > properly. At least, that's how I understand it - maybe I'm missing
> > > something?
> >
> > I think labels and Views are independent of each other. You can have
> > a view defined with 'from-position', and not use labels. Labels are
> > just generic markers, with nothing to say they're only useful for
> > defining views.
>
> But with from-position you can have only "first" and "last" which is
> even more restrictive than labels. If you want to do anything very
> sophisticated don't you need labels?
Yes, labels and positions. What else could there be?
> > Views give _every_ public URL in a sitemap an alternative form. If
> > you only need an alternative form of some URLs, then that can be done
> > just as you've described above, with a request-param selector.
>
> So ... I could just have use a RequestParamSelector to create my
> different views for the crawler? Damn!
I doubt it. I was just describing when you'd want to use views at all.
The old CLI chose to use views, which means there's no option for
per-pipeline customization.
> My problem was that I wanted to use Lucene to index a "content" view of
> 2 different pipelines, one of them based on TEI and another on HTML. In
> the case of the TEI pipeline I didn't want to convert the TEI to HTML
> first and then produce a "content" view based on an HTML-ized view of
> the TEI - I wanted an indexable view of the TEI. This is the same issue
> as you mention below:
>
> > The problem is that Views don't know the type of data they're
> > getting. If we have a view with from-label="content", we know it's
> > content, but what _type_ of content? What schema? What
> > transformation can we apply to create a links-view of this content?
>
> If you could create more than one view with the same name, then we
> could use labels to specify the schema:
>
> e.g. 2 pipelines containing:
> ...
> <map:generate src="{1}.xml" label="tei"/>
> ...
>
> and
>
> <map:transform src="blah-to-html.xsl" label="html"/>
>
> ... and 2 views called "content", one with from-label="tei" and the
> other with from-label="html".
Technically that's more or less the solution. I think a cleaner way
of presenting it is to have one view that interprets different kinds
of data differently:
<map:view name="links" from-position="content">
<map:select type="xml-type">
<map:when test="html">
<map:transform src="html2whatever.xsl"/>
</map:when>
<map:when test="tei">
<map:transform src="tei2whatever.xsl"/>
</map:when>
</map:select>
</map:view>
So, treating 'type' as a property of a sitemap component, independent
of labels. The xml-type selector would somehow discover the type of
XML emitted by its upstream component.
--Jeff
> Cheers
>
> Con
>