Re: Serializer for d-o-e?

2002-11-09 Thread Sylvain Wallez
J.Pietschmann wrote:

 Lenz, Evan wrote:

 I understand why Cocoon disables the use of disable-output-escaping
 in XSLT.
 However, in my current project, which involves parsing XML results from
 Google containing escaped (and non-well-formed) HTML, I need to find
 a way
 to disable output escaping for certain sections of text, perhaps
 based on
 the presence of a special attribute or PI that I can generate when
 necessary. Does Cocoon provide a way of parameterizing an existing
 serializer to do this? Has anyone implemented such a serializer? I would
 think that such a customization of an existing XML serializer should be
 pretty simple, but the Cocoon serialization framework is so abstract
 that
 I'm having trouble finding the right code to extend or modify.


 The answer is quite simple: you can't. D-o-e only works if the
 XSLT processor serializes the result itself, the information
 which text nodes are supposed to be d-o-e'd on output is not
 transported through the SAX pipelines Cocoon uses for plumbing
 it's components.


JAXP provides two special PIs to handle the case when serialization
isn't performed by the XSLT engine :
- ?javax.xml.transform.disable-output-escaping? to start d-o-e and
- ?javax.xml.transform.enable-output-escaping? to stop it.

See also
http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-commons/java/external/src/javax/xml/transform/Result.java?rev=1.2

Hope this helps, but use it wisely !

--
Sylvain Wallez  Anyware Technologies
http://www.apache.org/~sylvain   http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }





-
Please check that your question  has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faq/index.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:   [EMAIL PROTECTED]




Re: Serializer for d-o-e?

2002-11-08 Thread J.Pietschmann
Lenz, Evan wrote:

I understand why Cocoon disables the use of disable-output-escaping in XSLT.
However, in my current project, which involves parsing XML results from
Google containing escaped (and non-well-formed) HTML, I need to find a way
to disable output escaping for certain sections of text, perhaps based on
the presence of a special attribute or PI that I can generate when
necessary. Does Cocoon provide a way of parameterizing an existing
serializer to do this? Has anyone implemented such a serializer? I would
think that such a customization of an existing XML serializer should be
pretty simple, but the Cocoon serialization framework is so abstract that
I'm having trouble finding the right code to extend or modify.


The answer is quite simple: you can't. D-o-e only works if the
XSLT processor serializes the result itself, the information
which text nodes are supposed to be d-o-e'd on output is not
transported through the SAX pipelines Cocoon uses for plumbing
it's components.
One work around would be to do the opposite: emulate serializing
in XSLT and use a text serializer, with some magic so that the
client gets a content-type=text/html.

J.Pietschmann


-
Please check that your question  has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faq/index.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:   [EMAIL PROTECTED]




RE: Serializer for d-o-e?

2002-11-08 Thread Lenz, Evan
J.Pietschmann wrote:
 The answer is quite simple: you can't. D-o-e only works if the
 XSLT processor serializes the result itself,

Please re-read my message a little more carefully. It's easy to dismiss it
as a top-10 XSLT FAQ, but it isn't.

 the information
 which text nodes are supposed to be d-o-e'd on output is not
 transported through the SAX pipelines Cocoon uses for plumbing
 it's components.

Actually it can be if I just pass that information on as a special attribute
(or element or processing instruction). Note that I'm not interested in
using xsl:disable-output-escaping. I already understand that I can't and
that there are very good reasons why I can't.

An example is in order. Here is what I would like to do:

xsl:template match=html-blob
  html-blob my:disable-output-escaping=yes
xsl:value-of select=./
  /html-blob
/xsl:template

Then I would like a custom serializer to simply check every element (or
perhaps only certain elements) for the presence of the attribute in my
namespace called my:disable-output-escaping. When its value is yes, then
output the content of that element without escaping markup characters.

This is a general problem that comes up often enough in the real world that
I thought someone might have already implemented such a feature. I recall
that the Xalan serializer had some kind of PI-based hack for attaining the
same.

As it happens, I've already solved my problem at hand by using the Google
Appliance's internal XSLT processor (which supports
xsl:disable-output-escaping) to generate custom HTML, and then using the
HTMLGenerator to load the Google results into Cocoon. Not exactly Web
services, but it's at least nice to isolate the hack on the Google side. It
may break in rare cases, but at least my site will still only be serving
well-formed XHTML :-)

Evan

 -Original Message-
 From: J.Pietschmann [mailto:j3322ptm;yahoo.de]
 Sent: Friday, November 08, 2002 10:58 AM
 To: [EMAIL PROTECTED]
 Subject: Re: Serializer for d-o-e?
 
 Lenz, Evan wrote:
  I understand why Cocoon disables the use of disable-output-escaping in
 XSLT.
  However, in my current project, which involves parsing XML results from
  Google containing escaped (and non-well-formed) HTML, I need to find a
 way
  to disable output escaping for certain sections of text, perhaps based
 on
  the presence of a special attribute or PI that I can generate when
  necessary. Does Cocoon provide a way of parameterizing an existing
  serializer to do this? Has anyone implemented such a serializer? I would
  think that such a customization of an existing XML serializer should be
  pretty simple, but the Cocoon serialization framework is so abstract
 that
  I'm having trouble finding the right code to extend or modify.
 
 The answer is quite simple: you can't. D-o-e only works if the
 XSLT processor serializes the result itself, the information
 which text nodes are supposed to be d-o-e'd on output is not
 transported through the SAX pipelines Cocoon uses for plumbing
 it's components.
 One work around would be to do the opposite: emulate serializing
 in XSLT and use a text serializer, with some magic so that the
 client gets a content-type=text/html.
 
 J.Pietschmann
 
 
 -
 Please check that your question  has not already been answered in the
 FAQ before posting. http://xml.apache.org/cocoon/faq/index.html
 
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail:   [EMAIL PROTECTED]

-
Please check that your question  has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faq/index.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:   [EMAIL PROTECTED]




RE: Serializer for d-o-e?

2002-11-08 Thread Lenz, Evan
Hi Geoff,

 I had user edited real world html coming out of a
 database that would definitely have been invalid xml.
 My first pipeline serialized that result to xml and
 specified those elements as CDATA sections
 (configuration param in sitemap).  From then on, the
 bad html was unparsed down the pipeline, but was
 successfully output at the end by the html serializer
 as is.

This sounds like a bug in the HTML serializer rather than a feature... But
I'm confused: Are CDATA sections among the types of SAX events that Cocoon
passes through its pipelines? They aren't preserved in the XSLT/XPath data
model; where are they preserved? Are you saying that the HTMLSerializer
looks at a CDATA section event and serializes the value thereof unescaped?
If that's the case, then it's broken. Otherwise, I think I must be missing a
step in what you did.

 If your aim was to actually clean up the output, could
 you use jTidy to clean up the results?

I ended up using the HTMLGenerator (which I assume uses JTidy), but only
after using xsl:disable-output-escaping with the Google server's internal
XSLT processor. So I think my problem is solved. My original plan had been
to take Google's raw XML results and pass them through Cocoon's pipelines,
but that was unfeasible because of the isolated bits of escaped,
non-well-formed HTML that appear in different elements in the Google XML
results. In that case, I could have tried to apply JTidy (to each isolated
bit of HTML?), but I'm not sure how I could manage that in the sitemap
(multiple extractions from the same source and then aggregating all the
results again?), and in any case would be horribly inefficient even if I
were to figure out a way to do it.

Anyway, as I said, my current problem is solved. But I am still interested
in the possibility of a custom HTML serializer that will recognize a special
flag to disable output escaping. I just don't need it right away :-)

Thanks for the input.
Evan



 Geoff
 
 --- Lenz, Evan [EMAIL PROTECTED] wrote:
  J.Pietschmann wrote:
   The answer is quite simple: you can't. D-o-e only
  works if the
   XSLT processor serializes the result itself,
 
  Please re-read my message a little more carefully.
  It's easy to dismiss it
  as a top-10 XSLT FAQ, but it isn't.
 
   the information
   which text nodes are supposed to be d-o-e'd on
  output is not
   transported through the SAX pipelines Cocoon uses
  for plumbing
   it's components.
 
  Actually it can be if I just pass that information
  on as a special attribute
  (or element or processing instruction). Note that
  I'm not interested in
  using xsl:disable-output-escaping. I already
  understand that I can't and
  that there are very good reasons why I can't.
 
  An example is in order. Here is what I would like to
  do:
 
  xsl:template match=html-blob
html-blob my:disable-output-escaping=yes
  xsl:value-of select=./
/html-blob
  /xsl:template
 
  Then I would like a custom serializer to simply
  check every element (or
  perhaps only certain elements) for the presence of
  the attribute in my
  namespace called my:disable-output-escaping. When
  its value is yes, then
  output the content of that element without escaping
  markup characters.
 
  This is a general problem that comes up often enough
  in the real world that
  I thought someone might have already implemented
  such a feature. I recall
  that the Xalan serializer had some kind of PI-based
  hack for attaining the
  same.
 
  As it happens, I've already solved my problem at
  hand by using the Google
  Appliance's internal XSLT processor (which supports
  xsl:disable-output-escaping) to generate custom
  HTML, and then using the
  HTMLGenerator to load the Google results into
  Cocoon. Not exactly Web
  services, but it's at least nice to isolate the hack
  on the Google side. It
  may break in rare cases, but at least my site will
  still only be serving
  well-formed XHTML :-)
 
  Evan
 
   -Original Message-
   From: J.Pietschmann [mailto:j3322ptm;yahoo.de]
   Sent: Friday, November 08, 2002 10:58 AM
   To: [EMAIL PROTECTED]
   Subject: Re: Serializer for d-o-e?
  
   Lenz, Evan wrote:
I understand why Cocoon disables the use of
  disable-output-escaping in
   XSLT.
However, in my current project, which involves
  parsing XML results from
Google containing escaped (and non-well-formed)
  HTML, I need to find a
   way
to disable output escaping for certain sections
  of text, perhaps based
   on
the presence of a special attribute or PI that I
  can generate when
necessary. Does Cocoon provide a way of
  parameterizing an existing
serializer to do this? Has anyone implemented
  such a serializer? I would
think that such a customization of an existing
  XML serializer should be
pretty simple, but the Cocoon serialization
  framework is so abstract
   that
I'm having trouble finding the right code to
  extend or modify.
  
   The answer is quite simple: you