RE: System upgrade and now Cocoon is escaping tabs/entities.

2010-10-25 Thread . .

Chris,

So it turned out updating Xalan fixed the problem completely.

We went with Xalan 2.7.1 (which has Xerces 2.9.0 included).

We replace 'xercesImpl.jar' and 'xml-apis.jar' in Tomcat's endorsed folder and 
'xalan-2.6.1-dev-20041008T0304.jar' with 'xalan.jar' from 2.7.1 and added 
'serializer.jar' both in our lib folder.

Restarted Tomcat and the problem went away and nothing else on the site was 
affected. In fact, it seems a little faster now. :)

So now we're running find on CentOS 5, JDK 1.6.21 and Tomcat 5.0.28.

- J





 Date: Wed, 29 Sep 2010 09:41:55 -0400
 From: ch...@christopherschultz.net
 To: users@cocoon.apache.org
 Subject: Re: System upgrade and now Cocoon is escaping tabs/entities.
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 J,
 
 On 9/29/2010 1:10 AM, . . wrote:
  #a9 should be a copyright symbol if you're using ASCII.
 
  I suspect that #a9 is being used instead of a newline (0xa) followed by
  a tab (0x9).
  
  Actually it was a typo on my part. It's using #9; :( *oops*
 
 Yeah, that makes a ton of difference. I'm glad it wasn't 0xa9, 'cause
 that would have been a real mess. :)
 
  [file.encoding] is likely to solve both of your problems.
  
  I wrote a little JSP page to spit out the
  System.getProperty(file.encoding) value and got some surprising
  results. I tried two of the existing machines and got ISO-8859-1 for one
  and ANSI_X3.4-1968 for the other.
 
 ANSI_X3.4-1968, as you probably found out, is essentially basic ASCII,
 and ISO-8859-1 is ASCII plus a few other things, so they are compatible.
 It's not surprising that these two character sets are both working: if
 one works, the other has a good chance of working.
 
  The application runs fine on both of them. On the new server that too
  is giving out ISO-8859-1.
 
 Interesting.
 
  That said, we did an experiment last night and copied the entire
  previous Tomcat folder over to the new CentOS server and ran it with Sun
  JDK 1.4.29 - the problem disappeared. When we ran it with JDK 1.5 or 1.6
  the problem manifested itself.
  
  So the problem appears to related to the JDK in some way. Googling I
  came up with this:
  
  http://stackoverflow.com/questions/1059854/how-do-you-prevent-a-javax-transformer-from-escaping-whitespace
  
  Which makes me wonder if the old Xalan from our previous Tomcat is
  having issues with JDK 1.5 and up. I guess an Xalan upgrade is in order.
 
 Cocoon packages it's own Xalan library, so that shouldn't be the
 problem, although I can't remember when Sun started packaging Xalan with
 Java. At some point, I think they even removed it. What version of Xalan
 are you running? It should be in your webapp's WEB-INF/lib directory. I
 don't think there's been a Xalan update in quite a few years.
 
 Let us know how things turn out.
 
  NB: Tomcat 5.0 has been retired and really should be replaced. Upgrading
  to Tomcat 6.0 shouldn't be too much trouble.
  
  Only issue there is we have to support this legacy application for
  another 12 months and it's a hand me down so we have little or no
  source code or documentation. Porting it now would take up more
  time/effort than is financially viable right now :(
 
 Technically speaking, servlet containers are supposed to be backward
 compatible. I wouldn't be surprised if, given a review of your Context
 element for Tomcat (it should go into META-INF/context.xml, now in your
 webapp, instead of in conf/server.xml for the server), everything else
 works exactly as it did before.
 
 - -chris
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (MingW32)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
 
 iEYEARECAAYFAkyjQiMACgkQ9CaO5/Lv0PBtOACeKG7EgdIqh+vDNND8wFKAtGHM
 N08AnjBBlR2cvmgIu1BfIDy79bMSAs7Q
 =h7CA
 -END PGP SIGNATURE-
 
 -
 To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
 For additional commands, e-mail: users-h...@cocoon.apache.org
 
  

Re: System upgrade and now Cocoon is escaping tabs/entities.

2010-09-29 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

J,

On 9/29/2010 1:10 AM, . . wrote:
 #a9 should be a copyright symbol if you're using ASCII.

 I suspect that #a9 is being used instead of a newline (0xa) followed by
 a tab (0x9).
 
 Actually it was a typo on my part. It's using #9; :( *oops*

Yeah, that makes a ton of difference. I'm glad it wasn't 0xa9, 'cause
that would have been a real mess. :)

 [file.encoding] is likely to solve both of your problems.
 
 I wrote a little JSP page to spit out the
 System.getProperty(file.encoding) value and got some surprising
 results. I tried two of the existing machines and got ISO-8859-1 for one
 and ANSI_X3.4-1968 for the other.

ANSI_X3.4-1968, as you probably found out, is essentially basic ASCII,
and ISO-8859-1 is ASCII plus a few other things, so they are compatible.
It's not surprising that these two character sets are both working: if
one works, the other has a good chance of working.

 The application runs fine on both of them. On the new server that too
 is giving out ISO-8859-1.

Interesting.

 That said, we did an experiment last night and copied the entire
 previous Tomcat folder over to the new CentOS server and ran it with Sun
 JDK 1.4.29 - the problem disappeared. When we ran it with JDK 1.5 or 1.6
 the problem manifested itself.
 
 So the problem appears to related to the JDK in some way. Googling I
 came up with this:
 
 http://stackoverflow.com/questions/1059854/how-do-you-prevent-a-javax-transformer-from-escaping-whitespace
 
 Which makes me wonder if the old Xalan from our previous Tomcat is
 having issues with JDK 1.5 and up. I guess an Xalan upgrade is in order.

Cocoon packages it's own Xalan library, so that shouldn't be the
problem, although I can't remember when Sun started packaging Xalan with
Java. At some point, I think they even removed it. What version of Xalan
are you running? It should be in your webapp's WEB-INF/lib directory. I
don't think there's been a Xalan update in quite a few years.

Let us know how things turn out.

 NB: Tomcat 5.0 has been retired and really should be replaced. Upgrading
 to Tomcat 6.0 shouldn't be too much trouble.
 
 Only issue there is we have to support this legacy application for
 another 12 months and it's a hand me down so we have little or no
 source code or documentation. Porting it now would take up more
 time/effort than is financially viable right now :(

Technically speaking, servlet containers are supposed to be backward
compatible. I wouldn't be surprised if, given a review of your Context
element for Tomcat (it should go into META-INF/context.xml, now in your
webapp, instead of in conf/server.xml for the server), everything else
works exactly as it did before.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyjQiMACgkQ9CaO5/Lv0PBtOACeKG7EgdIqh+vDNND8wFKAtGHM
N08AnjBBlR2cvmgIu1BfIDy79bMSAs7Q
=h7CA
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



Re: System upgrade and now Cocoon is escaping tabs/entities.

2010-09-28 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

J,

On 9/28/2010 10:09 AM, . . wrote:
 Our original application components were:
 
 NetBSD 3.0.3 with Suse 9.x Linux compatibility layer.
 Sun JDK 1.4.26
 Tomcat 5.0.23
 Cocoon 2.1.6
 
 As part of the upgrade we switched to:
 
 Centos 5.3
 Sun JDK 1.6.21
 Tomcat 5.0.30
 Cocoon 2.1.6

[snip]

 Firstly, if any of our source XML/XSL files use tabs to indent the
 nodes, the outputted source escapes them as #A9; which it didn't do
 before. This isn't a problem for output to be displayed in a browser but
 we have a number of legacy Flash components which, annoyingly, don't
 recognise this as whitespace and refuses to load causing the Flash
 component to fail.

#a9 should be a copyright symbol if you're using ASCII.

I suspect that #a9 is being used instead of a newline (0xa) followed by
a tab (0x9).

My guess is that your JVM's file.encoding system property used to be
something like ISO-8859-1 or UTF-8 and now it's been changed to
something that is more exotic, perhaps even mandating 16-bit characters
(though your pages would be horribly jumbled if everything were
interpreted at 16-bit characters).

Check the file.encoding of your JVM in the old, working system relative
to the new, broken one. Also, check to make sure that your XML files
have the encoding set in the ?xml? processing instruction, and that
the encoding actually matches what you used when you wrote the file to
the disk. Finally, check to see if you have BOM characters at the start
of your XML files.

This is likely to solve both of your problems.

NB: Tomcat 5.0 has been retired and really should be replaced. Upgrading
to Tomcat 6.0 shouldn't be too much trouble.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkyiNykACgkQ9CaO5/Lv0PD5xgCbBS0jEpDVsd5z9OA3vwlkOqKr
WNoAoLLZfRUNW+Dbx/UiGyyOXLtdV2y9
=RGqP
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@cocoon.apache.org
For additional commands, e-mail: users-h...@cocoon.apache.org



RE: System upgrade and now Cocoon is escaping tabs/entities.

2010-09-28 Thread . .

 #a9 should be a copyright symbol if you're using ASCII.
 
 I suspect that #a9 is being used instead of a newline (0xa) followed by
 a tab (0x9).

Actually it was a typo on my part. It's using #9; :( *oops*

 My guess is that your JVM's file.encoding system property used to be
 something like ISO-8859-1 or UTF-8 and now it's been changed to
 something that is more exotic, perhaps even mandating 16-bit characters
 (though your pages would be horribly jumbled if everything were
 interpreted at 16-bit characters).
 
 Check the file.encoding of your JVM in the old, working system relative
 to the new, broken one. Also, check to make sure that your XML files
 have the encoding set in the ?xml? processing instruction, and that
 the encoding actually matches what you used when you wrote the file to
 the disk. Finally, check to see if you have BOM characters at the start
 of your XML files.
 
 This is likely to solve both of your problems.

I wrote a little JSP page to spit out the System.getProperty(file.encoding) 
value and got some surprising results. I tried two of the existing machines and 
got ISO-8859-1
for one and ANSI_X3.4-1968 for the other. The application runs fine on both of 
them. On the new server that too is giving out  ISO-8859-1.

That said, we did an experiment last night and copied the entire previous 
Tomcat folder over to the new CentOS server and ran it with Sun JDK 1.4.29 - 
the problem disappeared. When we ran it with JDK 1.5 or 1.6 the problem 
manifested itself.

So the problem appears to related to the JDK in some way. Googling I came up 
with this:

http://stackoverflow.com/questions/1059854/how-do-you-prevent-a-javax-transformer-from-escaping-whitespace

Which makes me wonder if the old Xalan from our previous Tomcat is having 
issues with JDK 1.5 and up. I guess an Xalan upgrade is in order.

 NB: Tomcat 5.0 has been retired and really should be replaced. Upgrading
 to Tomcat 6.0 shouldn't be too much trouble.

Only issue there is we have to support this legacy application for another 12 
months and it's a hand me down so we have little or no source code or 
documentation. Porting it now would take up more time/effort than is 
financially viable right now :(

- J