Re: Serializer for d-o-e?
J.Pietschmann wrote: Lenz, Evan wrote: I understand why Cocoon disables the use of disable-output-escaping in XSLT. However, in my current project, which involves parsing XML results from Google containing escaped (and non-well-formed) HTML, I need to find a way to disable output escaping for certain sections of text, perhaps based on the presence of a special attribute or PI that I can generate when necessary. Does Cocoon provide a way of parameterizing an existing serializer to do this? Has anyone implemented such a serializer? I would think that such a customization of an existing XML serializer should be pretty simple, but the Cocoon serialization framework is so abstract that I'm having trouble finding the right code to extend or modify. The answer is quite simple: you can't. D-o-e only works if the XSLT processor serializes the result itself, the information which text nodes are supposed to be d-o-e'd on output is not transported through the SAX pipelines Cocoon uses for plumbing it's components. JAXP provides two special PIs to handle the case when serialization isn't performed by the XSLT engine : - ?javax.xml.transform.disable-output-escaping? to start d-o-e and - ?javax.xml.transform.enable-output-escaping? to stop it. See also http://cvs.apache.org/viewcvs.cgi/*checkout*/xml-commons/java/external/src/javax/xml/transform/Result.java?rev=1.2 Hope this helps, but use it wisely ! -- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faq/index.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Serializer for d-o-e?
Lenz, Evan wrote: I understand why Cocoon disables the use of disable-output-escaping in XSLT. However, in my current project, which involves parsing XML results from Google containing escaped (and non-well-formed) HTML, I need to find a way to disable output escaping for certain sections of text, perhaps based on the presence of a special attribute or PI that I can generate when necessary. Does Cocoon provide a way of parameterizing an existing serializer to do this? Has anyone implemented such a serializer? I would think that such a customization of an existing XML serializer should be pretty simple, but the Cocoon serialization framework is so abstract that I'm having trouble finding the right code to extend or modify. The answer is quite simple: you can't. D-o-e only works if the XSLT processor serializes the result itself, the information which text nodes are supposed to be d-o-e'd on output is not transported through the SAX pipelines Cocoon uses for plumbing it's components. One work around would be to do the opposite: emulate serializing in XSLT and use a text serializer, with some magic so that the client gets a content-type=text/html. J.Pietschmann - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faq/index.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Serializer for d-o-e?
J.Pietschmann wrote: The answer is quite simple: you can't. D-o-e only works if the XSLT processor serializes the result itself, Please re-read my message a little more carefully. It's easy to dismiss it as a top-10 XSLT FAQ, but it isn't. the information which text nodes are supposed to be d-o-e'd on output is not transported through the SAX pipelines Cocoon uses for plumbing it's components. Actually it can be if I just pass that information on as a special attribute (or element or processing instruction). Note that I'm not interested in using xsl:disable-output-escaping. I already understand that I can't and that there are very good reasons why I can't. An example is in order. Here is what I would like to do: xsl:template match=html-blob html-blob my:disable-output-escaping=yes xsl:value-of select=./ /html-blob /xsl:template Then I would like a custom serializer to simply check every element (or perhaps only certain elements) for the presence of the attribute in my namespace called my:disable-output-escaping. When its value is yes, then output the content of that element without escaping markup characters. This is a general problem that comes up often enough in the real world that I thought someone might have already implemented such a feature. I recall that the Xalan serializer had some kind of PI-based hack for attaining the same. As it happens, I've already solved my problem at hand by using the Google Appliance's internal XSLT processor (which supports xsl:disable-output-escaping) to generate custom HTML, and then using the HTMLGenerator to load the Google results into Cocoon. Not exactly Web services, but it's at least nice to isolate the hack on the Google side. It may break in rare cases, but at least my site will still only be serving well-formed XHTML :-) Evan -Original Message- From: J.Pietschmann [mailto:j3322ptm;yahoo.de] Sent: Friday, November 08, 2002 10:58 AM To: [EMAIL PROTECTED] Subject: Re: Serializer for d-o-e? Lenz, Evan wrote: I understand why Cocoon disables the use of disable-output-escaping in XSLT. However, in my current project, which involves parsing XML results from Google containing escaped (and non-well-formed) HTML, I need to find a way to disable output escaping for certain sections of text, perhaps based on the presence of a special attribute or PI that I can generate when necessary. Does Cocoon provide a way of parameterizing an existing serializer to do this? Has anyone implemented such a serializer? I would think that such a customization of an existing XML serializer should be pretty simple, but the Cocoon serialization framework is so abstract that I'm having trouble finding the right code to extend or modify. The answer is quite simple: you can't. D-o-e only works if the XSLT processor serializes the result itself, the information which text nodes are supposed to be d-o-e'd on output is not transported through the SAX pipelines Cocoon uses for plumbing it's components. One work around would be to do the opposite: emulate serializing in XSLT and use a text serializer, with some magic so that the client gets a content-type=text/html. J.Pietschmann - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faq/index.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faq/index.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Serializer for d-o-e?
Hi Geoff, I had user edited real world html coming out of a database that would definitely have been invalid xml. My first pipeline serialized that result to xml and specified those elements as CDATA sections (configuration param in sitemap). From then on, the bad html was unparsed down the pipeline, but was successfully output at the end by the html serializer as is. This sounds like a bug in the HTML serializer rather than a feature... But I'm confused: Are CDATA sections among the types of SAX events that Cocoon passes through its pipelines? They aren't preserved in the XSLT/XPath data model; where are they preserved? Are you saying that the HTMLSerializer looks at a CDATA section event and serializes the value thereof unescaped? If that's the case, then it's broken. Otherwise, I think I must be missing a step in what you did. If your aim was to actually clean up the output, could you use jTidy to clean up the results? I ended up using the HTMLGenerator (which I assume uses JTidy), but only after using xsl:disable-output-escaping with the Google server's internal XSLT processor. So I think my problem is solved. My original plan had been to take Google's raw XML results and pass them through Cocoon's pipelines, but that was unfeasible because of the isolated bits of escaped, non-well-formed HTML that appear in different elements in the Google XML results. In that case, I could have tried to apply JTidy (to each isolated bit of HTML?), but I'm not sure how I could manage that in the sitemap (multiple extractions from the same source and then aggregating all the results again?), and in any case would be horribly inefficient even if I were to figure out a way to do it. Anyway, as I said, my current problem is solved. But I am still interested in the possibility of a custom HTML serializer that will recognize a special flag to disable output escaping. I just don't need it right away :-) Thanks for the input. Evan Geoff --- Lenz, Evan [EMAIL PROTECTED] wrote: J.Pietschmann wrote: The answer is quite simple: you can't. D-o-e only works if the XSLT processor serializes the result itself, Please re-read my message a little more carefully. It's easy to dismiss it as a top-10 XSLT FAQ, but it isn't. the information which text nodes are supposed to be d-o-e'd on output is not transported through the SAX pipelines Cocoon uses for plumbing it's components. Actually it can be if I just pass that information on as a special attribute (or element or processing instruction). Note that I'm not interested in using xsl:disable-output-escaping. I already understand that I can't and that there are very good reasons why I can't. An example is in order. Here is what I would like to do: xsl:template match=html-blob html-blob my:disable-output-escaping=yes xsl:value-of select=./ /html-blob /xsl:template Then I would like a custom serializer to simply check every element (or perhaps only certain elements) for the presence of the attribute in my namespace called my:disable-output-escaping. When its value is yes, then output the content of that element without escaping markup characters. This is a general problem that comes up often enough in the real world that I thought someone might have already implemented such a feature. I recall that the Xalan serializer had some kind of PI-based hack for attaining the same. As it happens, I've already solved my problem at hand by using the Google Appliance's internal XSLT processor (which supports xsl:disable-output-escaping) to generate custom HTML, and then using the HTMLGenerator to load the Google results into Cocoon. Not exactly Web services, but it's at least nice to isolate the hack on the Google side. It may break in rare cases, but at least my site will still only be serving well-formed XHTML :-) Evan -Original Message- From: J.Pietschmann [mailto:j3322ptm;yahoo.de] Sent: Friday, November 08, 2002 10:58 AM To: [EMAIL PROTECTED] Subject: Re: Serializer for d-o-e? Lenz, Evan wrote: I understand why Cocoon disables the use of disable-output-escaping in XSLT. However, in my current project, which involves parsing XML results from Google containing escaped (and non-well-formed) HTML, I need to find a way to disable output escaping for certain sections of text, perhaps based on the presence of a special attribute or PI that I can generate when necessary. Does Cocoon provide a way of parameterizing an existing serializer to do this? Has anyone implemented such a serializer? I would think that such a customization of an existing XML serializer should be pretty simple, but the Cocoon serialization framework is so abstract that I'm having trouble finding the right code to extend or modify. The answer is quite simple: you