Re: ObjectRepresentation and String encoding
Hi Jerome, Jerome Louvel wrote: After looking again at your issue with Thierry, we concluded that there is no bug, just some classical characters encoding confusion :-). Ok, that's what I thought of when I wrote it seems to be an encoding issue which might not even be related to restlets in my last message. Now, with your Restlet approach, the object is serialized using Java serialization (binary scheme). When it gets deserialized, it restores strings in the JVM as UTF-16 (Java's internal encoding for strings). When you print those strings to your console, there is an issue because the console expects another encoding (ISO-8859-1) and has no way to automatically convert your UTF string. I thought that Java will convert it to the system's default encoding when writing e.g. to the console. Thanks for looking into this and your explanation! Best regards, Hannes
Re: ObjectRepresentation and String encoding
Hi Hannes, After looking again at your issue with Thierry, we concluded that there is no bug, just some classical characters encoding confusion :-). What happens is probably this: when you use your plain socket, your objects are serialized to XML, which will contain metadata about the encoding used by the client. When the server receives it and deserialize the XML, it can decode it to the local character encoding when printing its content. Now, with your Restlet approach, the object is serialized using Java serialization (binary scheme). When it gets deserialized, it restores strings in the JVM as UTF-16 (Java's internal encoding for strings). When you print those strings to your console, there is an issue because the console expects another encoding (ISO-8859-1) and has no way to automatically convert your UTF string. One solution is to change the encoding of your console to UTF-16. The other is to convert your string to your local encoding before printing. You could use the java.io.OutputStreamWriter for this purpose, wrapping the System.out stream and passing ISO-8859-1 as the encoding. Some text editors are smart enough and can detect the encoding of a text file. That's probably why Gedit works for you. I hope this clarified the issue. We are closing issue#525 now. Best regards, Jérôme Louvel -- Restlet ~ Founder and Lead developer ~ http://www.restlet.org http://www.restlet.org/ Noelios Technologies ~ Co-founder ~ http://www.noelios.com Hannes Ebner a écrit : Hi Thierry, Thierry Boileau wrote: I had a look at the issue and I don't see what's wrong. I was able to send a serialized object from a client using UTF-8 to a server using ISO-8859-1 without encoding issues. Could you send us a reproductible test case, and send us also the trace of the following code on both client and server side? I planned to write a reproducable test case, but I didn't get that far. I tried your small application first (the one attached to the bug report) and ran it directly on the server which uses ISO-8859-1. The server sent some data to itself and printed it on the console, so I guess there is no change in the encoding involved. The result that I got on the console was: Une cha�ne de caract�res. I could also reproduce it on another server with ISO-8859-1, but not on a third one which was configured for UTF-8. With UTF-8 I got the correct string: Une chaîne de caractères. I also tried to pipe the console output into a file, which I transferred to my development machine (which uses UTF-8). A simple cat of this file on the console showed question marks like above, but when I opened the same file on the same machine with a graphical editor (gedit), the special characters showed up correctly. I'm confused now, it seems to be an encoding issue which might not even be related to restlets. The questions are now: how do I solve it, and why does it work to transfer the special characters with the old version of my service, which does not use restlets (it uses Object-to-XML serialization over a socket). I also attached the requested system properties to this mail, one file with the settings from an ISO-8859-1 server, the other one with UTF-8. I don't know whether it helps you to find something, I couldn't see anything strange. Perhaps you have or somebody else on the list has some experience with problems related to character encodings, I'm out of ideas right now. Best regards, Hannes
Re: ObjectRepresentation and String encoding
Hi Thierry, Thierry Boileau wrote: I had a look at the issue and I don't see what's wrong. I was able to send a serialized object from a client using UTF-8 to a server using ISO-8859-1 without encoding issues. Could you send us a reproductible test case, and send us also the trace of the following code on both client and server side? I planned to write a reproducable test case, but I didn't get that far. I tried your small application first (the one attached to the bug report) and ran it directly on the server which uses ISO-8859-1. The server sent some data to itself and printed it on the console, so I guess there is no change in the encoding involved. The result that I got on the console was: Une cha�ne de caract�res. I could also reproduce it on another server with ISO-8859-1, but not on a third one which was configured for UTF-8. With UTF-8 I got the correct string: Une chaîne de caractères. I also tried to pipe the console output into a file, which I transferred to my development machine (which uses UTF-8). A simple cat of this file on the console showed question marks like above, but when I opened the same file on the same machine with a graphical editor (gedit), the special characters showed up correctly. I'm confused now, it seems to be an encoding issue which might not even be related to restlets. The questions are now: how do I solve it, and why does it work to transfer the special characters with the old version of my service, which does not use restlets (it uses Object-to-XML serialization over a socket). I also attached the requested system properties to this mail, one file with the settings from an ISO-8859-1 server, the other one with UTF-8. I don't know whether it helps you to find something, I couldn't see anything strange. Perhaps you have or somebody else on the list has some experience with problems related to character encodings, I'm out of ideas right now. Best regards, Hannes java.runtime.name=Java(TM) 2 Runtime Environment, Standard Edition sun.boot.library.path=/usr/lib/j2sdk1.5-sun/jre/lib/i386 java.vm.version=1.5.0_11-b03 java.vm.vendor=Sun Microsystems Inc. java.vendor.url=http://java.sun.com/ path.separator=: java.vm.name=Java HotSpot(TM) Client VM file.encoding.pkg=sun.io sun.java.launcher=SUN_STANDARD user.country=US sun.os.patch.level=unknown java.vm.specification.name=Java Virtual Machine Specification user.dir=/mnt/user/home/ebner/test-case java.runtime.version=1.5.0_11-b03 java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment java.endorsed.dirs=/usr/lib/j2sdk1.5-sun/jre/lib/endorsed os.arch=i386 java.io.tmpdir=/tmp line.separator= java.vm.specification.vendor=Sun Microsystems Inc. os.name=Linux sun.jnu.encoding=UTF-8 java.library.path=/usr/lib/j2sdk1.5-sun/jre/lib/i386/client:/usr/lib/j2sdk1.5-sun/jre/lib/i386:/usr/lib/j2sdk1.5-sun/jre/../lib/i386 java.specification.name=Java Platform API Specification java.class.version=49.0 sun.management.compiler=HotSpot Client Compiler os.version=2.6.18-6-686 user.home=/home/ebner user.timezone= java.awt.printerjob=sun.print.PSPrinterJob file.encoding=UTF-8 java.specification.version=1.5 java.class.path=test.jar user.name=ebner java.vm.specification.version=1.0 java.home=/usr/lib/j2sdk1.5-sun/jre sun.arch.data.model=32 user.language=en java.specification.vendor=Sun Microsystems Inc. java.vm.info=mixed mode, sharing java.version=1.5.0_11 java.ext.dirs=/usr/lib/j2sdk1.5-sun/jre/lib/ext sun.boot.class.path=/usr/lib/j2sdk1.5-sun/jre/lib/rt.jar:/usr/lib/j2sdk1.5-sun/jre/lib/i18n.jar:/usr/lib/j2sdk1.5-sun/jre/lib/sunrsasign.jar:/usr/lib/j2sdk1.5-sun/jre/lib/jsse.jar:/usr/lib/j2sdk1.5-sun/jre/lib/jce.jar:/usr/lib/j2sdk1.5-sun/jre/lib/charsets.jar:/usr/lib/j2sdk1.5-sun/jre/classes java.vendor=Sun Microsystems Inc. file.separator=/ java.vendor.url.bug=http://java.sun.com/cgi-bin/bugreport.cgi sun.io.unicode.encoding=UnicodeLittle sun.cpu.endian=little sun.cpu.isalist= java.runtime.name=Java(TM) 2 Runtime Environment, Standard Edition sun.boot.library.path=/usr/lib/jvm/java-1.5.0-sun-1.5.0.14/jre/lib/i386 java.vm.version=1.5.0_14-b03 java.vm.vendor=Sun Microsystems Inc. java.vendor.url=http://java.sun.com/ path.separator=: java.vm.name=Java HotSpot(TM) Server VM file.encoding.pkg=sun.io sun.java.launcher=SUN_STANDARD user.country=US sun.os.patch.level=unknown java.vm.specification.name=Java Virtual Machine Specification user.dir=/var/collaborilla-rest java.runtime.version=1.5.0_14-b03 java.awt.graphicsenv=sun.awt.X11GraphicsEnvironment java.endorsed.dirs=/usr/lib/jvm/java-1.5.0-sun-1.5.0.14/jre/lib/endorsed os.arch=i386 java.io.tmpdir=/tmp line.separator= java.vm.specification.vendor=Sun Microsystems Inc. os.name=Linux sun.jnu.encoding=ISO-8859-1 java.library.path=/usr/lib/jvm/java-1.5.0-sun-1.5.0.14/jre/lib/i386/server:/usr/lib/jvm/java-1.5.0-sun-1.5.0.14/jre/lib/i386:/usr/lib/jvm/java-1.5.0-sun-1.5.0.14/jre/../lib/i386 java.specification.name=Java Platform API Specification java.class.version=49.0
Re: ObjectRepresentation and String encoding
Hi Hannes, I had a look at the issue and I don't see what's wrong. I was able to send a serialized object from a client using UTF-8 to a server using ISO-8859-1 without encoding issues. Could you send us a reproductible test case, and send us also the trace of the following code on both client and server side? for (EntryObject, Object entry : System.getProperties().entrySet()) { System.out.print(entry.getKey()); System.out.print(=); System.out.println(entry.getValue()); } Best regards, Thierry Boileau -- Restlet ~ Core developer ~ http://www.restlet.org http://www.restlet.org/ Noelios Technologies ~ Co-founder ~ http://www.noelios.com http://www.noelios.com/ Hi Jerome, It looks like a bug but after looking at the code I don't see what we are doing wrong as we have no control on encoding for Object serialization. Anyway, I've entered a bug report: great, thanks! If you could attach a reproducible test case (client+server code), that would help us fix it more quickly. Also, could you add a comment to the report indicating which client and server connectors you are using? Yes, I will do this during the next days. Best regards, Hannes
Re: ObjectRepresentation and String encoding
Hi Thierry, Thierry Boileau wrote: Could you send us a reproductible test case, and send us also the trace of the following code on both client and server side? I will try to reproduce it with a small test case and get back to you. Best regards, Hannes
RE: ObjectRepresentation and String encoding
Hi Hannes, It looks like a bug but after looking at the code I don't see what we are doing wrong as we have no control on encoding for Object serialization. Anyway, I've entered a bug report: Encoding issue with ObjectRepresentation http://restlet.tigris.org/issues/show_bug.cgi?id=525 If you could attach a reproducible test case (client+server code), that would help us fix it more quickly. Also, could you add a comment to the report indicating which client and server connectors you are using? Best regards, Jerome -Message d'origine- De : Hannes Ebner [mailto:[EMAIL PROTECTED] Envoyé : lundi 30 juin 2008 12:37 À : discuss@restlet.tigris.org Objet : Re: ObjectRepresentation and String encoding Hi Stephan, Stephan Koops wrote: you could explicit set the character encoding of a representation. Perhaps you have to set ISO-8859-1 into the representation? Use Representation.setCharacterSet(...) I don't think that this works with serialized objects. I tried to set the character set, but it didn't show up in the HTTP header on the other side. I tried to serialize the very same object myself (without Restlets involved), and sent it directly via a TCP Socket, and it worked. Could this be a bug somewhere in Restlet's ObjectRepresentation? Best regards, Hannes
Re: ObjectRepresentation and String encoding
Hi Jerome, Jerome Louvel wrote: It looks like a bug but after looking at the code I don't see what we are doing wrong as we have no control on encoding for Object serialization. Anyway, I've entered a bug report: great, thanks! If you could attach a reproducible test case (client+server code), that would help us fix it more quickly. Also, could you add a comment to the report indicating which client and server connectors you are using? Yes, I will do this during the next days. Best regards, Hannes
Re: ObjectRepresentation and String encoding
Hi Stephan, Stephan Koops wrote: you could explicit set the character encoding of a representation. Perhaps you have to set ISO-8859-1 into the representation? Use Representation.setCharacterSet(...) I don't think that this works with serialized objects. I tried to set the character set, but it didn't show up in the HTTP header on the other side. I tried to serialize the very same object myself (without Restlets involved), and sent it directly via a TCP Socket, and it worked. Could this be a bug somewhere in Restlet's ObjectRepresentation? Best regards, Hannes