Hi Geoff,

> I had user edited "real world" html coming out of a
> database that would definitely have been invalid xml.
> My first pipeline serialized that result to xml and
> specified those elements as CDATA sections
> (configuration param in sitemap).  From then on, the
> bad html was unparsed down the pipeline, but was
> successfully output at the end by the html serializer
> "as is".

This sounds like a bug in the HTML serializer rather than a feature... But
I'm confused: Are CDATA sections among the types of SAX events that Cocoon
passes through its pipelines? They aren't preserved in the XSLT/XPath data
model; where are they preserved? Are you saying that the HTMLSerializer
looks at a CDATA section event and serializes the value thereof unescaped?
If that's the case, then it's broken. Otherwise, I think I must be missing a
step in what you did.

> If your aim was to actually clean up the output, could
> you use jTidy to clean up the results?

I ended up using the HTMLGenerator (which I assume uses JTidy), but only
after using xsl:disable-output-escaping with the Google server's internal
XSLT processor. So I think my problem is solved. My original plan had been
to take Google's raw XML results and pass them through Cocoon's pipelines,
but that was unfeasible because of the isolated bits of escaped,
non-well-formed HTML that appear in different elements in the Google XML
results. In that case, I could have tried to apply JTidy (to each isolated
bit of HTML?), but I'm not sure how I could manage that in the sitemap
(multiple extractions from the same source and then aggregating all the
results again?), and in any case would be horribly inefficient even if I
were to figure out a way to do it.

Anyway, as I said, my current problem is solved. But I am still interested
in the possibility of a custom HTML serializer that will recognize a special
flag to disable output escaping. I just don't need it right away :-)

Thanks for the input.
Evan



> Geoff
> 
> --- "Lenz, Evan" <[EMAIL PROTECTED]> wrote:
> > J.Pietschmann wrote:
> > > The answer is quite simple: you can't. D-o-e only
> > works if the
> > > XSLT processor serializes the result itself,
> >
> > Please re-read my message a little more carefully.
> > It's easy to dismiss it
> > as a top-10 XSLT FAQ, but it isn't.
> >
> > > the information
> > > which text nodes are supposed to be d-o-e'd on
> > output is not
> > > transported through the SAX pipelines Cocoon uses
> > for plumbing
> > > it's components.
> >
> > Actually it can be if I just pass that information
> > on as a special attribute
> > (or element or processing instruction). Note that
> > I'm not interested in
> > using xsl:disable-output-escaping. I already
> > understand that I can't and
> > that there are very good reasons why I can't.
> >
> > An example is in order. Here is what I would like to
> > do:
> >
> > <xsl:template match="html-blob">
> >   <html-blob my:disable-output-escaping="yes">
> >     <xsl:value-of select="."/>
> >   </html-blob>
> > </xsl:template>
> >
> > Then I would like a custom serializer to simply
> > check every element (or
> > perhaps only certain elements) for the presence of
> > the attribute in my
> > namespace called my:disable-output-escaping. When
> > its value is yes, then
> > output the content of that element without escaping
> > markup characters.
> >
> > This is a general problem that comes up often enough
> > in the real world that
> > I thought someone might have already implemented
> > such a feature. I recall
> > that the Xalan serializer had some kind of PI-based
> > hack for attaining the
> > same.
> >
> > As it happens, I've already solved my problem at
> > hand by using the Google
> > Appliance's internal XSLT processor (which supports
> > xsl:disable-output-escaping) to generate custom
> > HTML, and then using the
> > HTMLGenerator to load the Google results into
> > Cocoon. Not exactly Web
> > services, but it's at least nice to isolate the hack
> > on the Google side. It
> > may break in rare cases, but at least my site will
> > still only be serving
> > well-formed XHTML :-)
> >
> > Evan
> >
> > > -----Original Message-----
> > > From: J.Pietschmann [mailto:j3322ptm@;yahoo.de]
> > > Sent: Friday, November 08, 2002 10:58 AM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: Serializer for d-o-e?
> > >
> > > Lenz, Evan wrote:
> > > > I understand why Cocoon disables the use of
> > disable-output-escaping in
> > > XSLT.
> > > > However, in my current project, which involves
> > parsing XML results from
> > > > Google containing escaped (and non-well-formed)
> > HTML, I need to find a
> > > way
> > > > to disable output escaping for certain sections
> > of text, perhaps based
> > > on
> > > > the presence of a special attribute or PI that I
> > can generate when
> > > > necessary. Does Cocoon provide a way of
> > parameterizing an existing
> > > > serializer to do this? Has anyone implemented
> > such a serializer? I would
> > > > think that such a customization of an existing
> > XML serializer should be
> > > > pretty simple, but the Cocoon serialization
> > framework is so abstract
> > > that
> > > > I'm having trouble finding the right code to
> > extend or modify.
> > >
> > > The answer is quite simple: you can't. D-o-e only
> > works if the
> > > XSLT processor serializes the result itself, the
> > information
> > > which text nodes are supposed to be d-o-e'd on
> > output is not
> > > transported through the SAX pipelines Cocoon uses
> > for plumbing
> > > it's components.
> > > One work around would be to do the opposite:
> > emulate serializing
> > > in XSLT and use a text serializer, with some magic
> > so that the
> > > client gets a content-type=text/html.
> > >
> > > J.Pietschmann
> > >
> > >
> > >
> >
> ---------------------------------------------------------------------
> > > Please check that your question  has not already
> > been answered in the
> > > FAQ before posting.
> > <http://xml.apache.org/cocoon/faq/index.html>
> > >
> > > To unsubscribe, e-mail:
> > <[EMAIL PROTECTED]>
> > > For additional commands, e-mail:
> > <[EMAIL PROTECTED]>
> >
> >
> ---------------------------------------------------------------------
> > Please check that your question  has not already
> > been answered in the
> > FAQ before posting.
> > <http://xml.apache.org/cocoon/faq/index.html>
> >
> > To unsubscribe, e-mail:
> > <[EMAIL PROTECTED]>
> > For additional commands, e-mail:
> > <[EMAIL PROTECTED]>
> >
> 
> 
> __________________________________________________
> Do you Yahoo!?
> U2 on LAUNCH - Exclusive greatest hits videos
> http://launch.yahoo.com/u2
> 
> ---------------------------------------------------------------------
> Please check that your question  has not already been answered in the
> FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>
> 
> To unsubscribe, e-mail:     <[EMAIL PROTECTED]>
> For additional commands, e-mail:   <[EMAIL PROTECTED]>

---------------------------------------------------------------------
Please check that your question  has not already been answered in the
FAQ before posting.     <http://xml.apache.org/cocoon/faq/index.html>

To unsubscribe, e-mail:     <[EMAIL PROTECTED]>
For additional commands, e-mail:   <[EMAIL PROTECTED]>

Reply via email to