RE: Character Encoding problem (umlauts, etc).

2003-09-08 Thread Bodycombe, Andrew
This problem can usually be fixed by changing the file.encoding system
property.
Set CATALINA_OPTS to "-Dfile.encoding=utf-8" (or iso-8859-1 or whatever
character set you like) and restart tomcat

Hope this helps
Andy

-Original Message-
From: Robert Priest [mailto:[EMAIL PROTECTED] 
Sent: 08 September 2003 14:18
To: 'Tomcat Users List'
Subject: RE: Character Encoding problem (umlauts, etc).


Thanks for the information Anton. But just getting rid of umlauts or other
international characters is not an option when you have clients that use
your software in other countries, that have those special characters. We
cannot rename user files or changed that data. That would be very, very, bad
:)

-Original Message-
From: Anton Tagunov [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 06, 2003 5:46 AM
To: Tomcat Users List
Subject: Re: Character Encoding problem (umlauts, etc).


Hello Robert!

Robert Priest <[EMAIL PROTECTED]> wrote:
RP> I am requesting file :
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
RP> but what is coming across in the request is:
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"

Probably your browser is sending it that way?
I guess it is a bad idea anyways to type anything nasty
in the browser URL input line.

You may try to spy your interaction between browser and
server, I have described how to do it in one of the sections
of my ancient http://tagunov.tripod.com, try to find it there,
then you'll know for sure what bytes are sent by browser.

I guess that it is generally a bad idea to have anything
nasty in the url at all. The closest you could get would be
to encode it all as %AD and etc. But then you should be
sure what encoding this is (utf-8 or anything).

So, if these are links from your HTML page, why don't you
encode all in the url directly on the server side and
have 

but then why don't you get rid of these nasty umlauts at all?

Why not use only normal latin letters, or, in case you heavily use
numeric ids already, use only numeric ids?

Anton


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character Encoding problem (umlauts, etc).

2003-09-08 Thread Thomas Kellerer
Robert Priest schrieb:

I have a servlet that catches a request for a file.

How is the request sent?

If sent via an HTML form, you need to include the accept-charset="UTF-8" 
attribute into your  tag

Thomas



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Character Encoding problem (umlauts, etc).

2003-09-08 Thread Robert Priest
Thanks for the information Anton. But just getting rid of umlauts or other
international characters is not an option when you have clients that use
your software in other countries, that have those special characters. We
cannot rename user files or changed that data. That would be very, very, bad
:)

-Original Message-
From: Anton Tagunov [mailto:[EMAIL PROTECTED]
Sent: Saturday, September 06, 2003 5:46 AM
To: Tomcat Users List
Subject: Re: Character Encoding problem (umlauts, etc).


Hello Robert!

Robert Priest <[EMAIL PROTECTED]> wrote:
RP> I am requesting file :
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
RP> but what is coming across in the request is:
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"

Probably your browser is sending it that way?
I guess it is a bad idea anyways to type anything nasty
in the browser URL input line.

You may try to spy your interaction between browser and
server, I have described how to do it in one of the sections
of my ancient http://tagunov.tripod.com, try to find it there,
then you'll know for sure what bytes are sent by browser.

I guess that it is generally a bad idea to have anything
nasty in the url at all. The closest you could get would be
to encode it all as %AD and etc. But then you should be
sure what encoding this is (utf-8 or anything).

So, if these are links from your HTML page, why don't you
encode all in the url directly on the server side and
have 

but then why don't you get rid of these nasty umlauts at all?

Why not use only normal latin letters, or, in case you heavily use
numeric ids already, use only numeric ids?

Anton


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character Encoding problem (umlauts, etc).

2003-09-06 Thread Anton Tagunov
Hello Robert!

Robert Priest <[EMAIL PROTECTED]> wrote:
RP> I am requesting file :
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
RP> but what is coming across in the request is:
RP> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"

Probably your browser is sending it that way?
I guess it is a bad idea anyways to type anything nasty
in the browser URL input line.

You may try to spy your interaction between browser and
server, I have described how to do it in one of the sections
of my ancient http://tagunov.tripod.com, try to find it there,
then you'll know for sure what bytes are sent by browser.

I guess that it is generally a bad idea to have anything
nasty in the url at all. The closest you could get would be
to encode it all as %AD and etc. But then you should be
sure what encoding this is (utf-8 or anything).

So, if these are links from your HTML page, why don't you
encode all in the url directly on the server side and
have 

but then why don't you get rid of these nasty umlauts at all?

Why not use only normal latin letters, or, in case you heavily use
numeric ids already, use only numeric ids?

Anton


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Character Encoding problem (umlauts, etc).

2003-09-04 Thread Jeff Tulley
The FAQ ( http://jakarta.apache.org/tomcat/faq ) has a link to a thread on "How to 
UTF-8 your site", which I think might be similar.  
http://marc.theaimsgroup.com/?l=tomcat-user&m=105524426515137&w=2
is the link to the thread itself.  Try some of the things there and see if they work 
for you. (specifically, starting Tomcat with a "-Dfile.encoding=UTF-8" switch)

Jeff Tulley  ([EMAIL PROTECTED])
(801)861-5322
Novell, Inc., The Leading Provider of Net Business Solutions
http://www.novell.com

>>> [EMAIL PROTECTED] 9/4/03 3:24:58 PM >>>
This is in a JSP page (which of course becomes a servlet).

Do I have to set the encoding in Tomcat perhaps?



-Original Message-
From: Robert Priest [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 04, 2003 5:16 PM
To: '[EMAIL PROTECTED]'
Subject: Character Encoding problem (umlauts, etc).


> I have a servlet that catches a request for a file.
> 
> But if that file has characters such as an umlaut in it (for example: ä),
> the path info is all wrong.
> 
> For example:  I am requesting file : 
> 
> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
> 
> but what is coming across in the request is:
> 
> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"
> 
> 
> I have tried:
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("ISO-8859-1"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("Unicode"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("UTF8"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("UnicodeLittle"));
> 
> 
> But none of them are returning correctly.
> 
> Does anyone know what the correct know what is the correct unicode
> encoding I should have?
> 
> Any other suggestions?
> 
> I know this problem has been solved before so If you could point me in the
> direction of the solution on the web that is fine.
> 
> THanks in advance.

-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 

-
To unsubscribe, e-mail: [EMAIL PROTECTED] 
For additional commands, e-mail: [EMAIL PROTECTED] 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Character Encoding problem (umlauts, etc).

2003-09-04 Thread Robert Priest
This is in a JSP page (which of course becomes a servlet).

Do I have to set the encoding in Tomcat perhaps?



-Original Message-
From: Robert Priest [mailto:[EMAIL PROTECTED]
Sent: Thursday, September 04, 2003 5:16 PM
To: '[EMAIL PROTECTED]'
Subject: Character Encoding problem (umlauts, etc).


> I have a servlet that catches a request for a file.
> 
> But if that file has characters such as an umlaut in it (for example: ä),
> the path info is all wrong.
> 
> For example:  I am requesting file : 
> 
> "/38CF278C0186B466222FC48571080B83/51/dms00051/äää.txt"
> 
> but what is coming across in the request is:
> 
> "/38CF278C0186B466222FC48571080B83/51/dms00051/???.txt"
> 
> 
> I have tried:
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("ISO-8859-1"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("Unicode"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("UTF8"));
> String requestPathInfo5 = new
> String(request.getPathInfo().getBytes("UnicodeLittle"));
> 
> 
> But none of them are returning correctly.
> 
> Does anyone know what the correct know what is the correct unicode
> encoding I should have?
> 
> Any other suggestions?
> 
> I know this problem has been solved before so If you could point me in the
> direction of the solution on the web that is fine.
> 
> THanks in advance.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Character Encoding problem

2002-08-30 Thread Irina Lishchenko

On Wednesday 28 August 2002 13:17, you wrote:
> Hi
>
> I am using tomact 4.0.4 and JDK1.3.1
>
> I am trying to read parameter in hebrew from the URL but get '???' writing
> Hebrew to the browser works fine
>
> I can not use req.setCharacterEncoding(java.lang.String env) (can not
> compile the code when I am using it)
>
> is there a way to go around it. I am very flexiable in choosing the JDK and
> TOMCAT version to work with but they need to be release version and not
> beta or something like this
>

You can use followed form just with right encoding for you, W3C has foreseen 
the atribute accept-charset for element/tag form. Then your request object 
will have right encoding too



~~~
...



ilis

--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Character Encoding problem

2002-08-28 Thread Bill Barker


"Nehemia Litterat" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
>
> Hi
>
> I am using tomact 4.0.4 and JDK1.3.1
>
> I am trying to read parameter in hebrew from the URL but get '???' writing
Hebrew to the browser works fine
>
> I can not use req.setCharacterEncoding(java.lang.String env) (can not
compile the code when I am using it)
>

This is almost certainly due to having an older version of servlet.jar in
your classpath.  One especial got-ya is to have on older version of j2ee.jar
and/or servlet.jar in $JAVA_HOME/jre/lib/ext.  If this is the case, kill it.
Otherwise, try compiling with:
javac -classpath $CATALINA_HOME/common/lib/servlet.jar:$CLASSPATH
MyServlet.java

> is there a way to go around it. I am very flexiable in choosing the JDK
and TOMCAT version to work with but they need to be release version and not
beta or something like this


Tomcat 3.3.1 has excellent charset support, especially if you are willing to
use it's non-portable features.

However, once you solve your compilation problem, 4.x should do everything
that you need it to do (and portably as well :).

> Thanks in advance
>
> Nehemia Litterat
>
>
>
>
>
> -
> Do You Yahoo!?
> Yahoo! Finance - Get real-time stock quotes





--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




Re: Character Encoding problem

2002-08-28 Thread Fabio Mengue

Perhaps something like this is /bin/setclasspath.sh

JAVA_OPTS="-Dfile.encoding=ISO-8859-1"

(with you ISO configuration, of course).

Fabio.

Nehemia Litterat wrote:

>Hi 
>
>I am using tomact 4.0.4 and JDK1.3.1
>
>I am trying to read parameter in hebrew from the URL but get '???' writing Hebrew to 
>the browser works fine
>
>I can not use req.setCharacterEncoding(java.lang.String env) (can not compile the 
>code when I am using it)
>
>is there a way to go around it. I am very flexiable in choosing the JDK and TOMCAT 
>version to work with but they need to be release version and not beta or something 
>like this 
>
>Thanks in advance
>
>Nehemia Litterat
>
> 
>
>
>
>-
>Do You Yahoo!?
>Yahoo! Finance - Get real-time stock quotes
>  
>

-- 
Fabio Mengue - Centro de Computacao - Unicamp
[EMAIL PROTECTED]   [EMAIL PROTECTED]
"Quem se mata de trabalhar merece mesmo morrer." - Millor



--
To unsubscribe, e-mail:   
For additional commands, e-mail: 




RE: Character Encoding Problem

2001-07-02 Thread Tõnu Põld

Hi,

I still believe your initial bytes are converted to java strings (unicode)
using a wrong encoding.

If you have a string created from bytes using the "ISO-8859-9" encoding, and
if the JSP page has a directive <%@ page content-type="ISO-8859-9"%>, then
it should be OK. 

For debuging you could try to convert your string to another encoding, look
what happens.
For example:

<%@ page content-type="ISO-8859-9"%>
String s = new String( initalString.getBytes("ISO-8859-1"), "ISO-8859-9");
<%= s %>

If this dislays your string correctly, then you have used the "ISO-8859-1"
encoding in creation of a java string from inital bytes!

By the way which version of Tomcat are you using. An older release (3.2.1)
had some bugs with encoding conversion. Try the latest 3.2.2 release.

The request parameters from HTTP post are probably in "ISO-8859-1" encoding
because most browsers do not specify the encoding when submiting a request,
so Tomcat uses the default encoding. To convert them correctly to java
strings encoding, the following could be used (assuming that they really are
"ISO-8859-9"):
String param = new String( initalParam.getBytes("ISO-8859-1"),
"ISO-8859-9");

Regards,
Tõnu



> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Monday, July 02, 2001 3:34 PM
> To: [EMAIL PROTECTED]
> Subject: RE: Character Encoding Problem
> 
> 
> > > When reading bytes from file with FileReader the default 
> character encoding
> > > is used.
> > > I think you must specify your own encoding when reading the file.
> > >
> > 
> > I'll try that. But the same compiled classes and the same 
> jdk version works well with Resin JSP Server and the files. 
> The problem occurs with Tomcat.
> > 
> 
> Nay, still problems. A lot of ?'s in the visual output. 
> Himpf. The point is, which is interesting, if a have an html 
> form with ISO-8859-9 encoded chars, and post it to a jsp 
> file, and write the parameters to a system text file, the 
> text is correct!!
> 
> The problem seems to occur during displaying strings obtained 
> from inside a class .. Any other display problems with other 
> character sets ?? .. Any more idea ?? ..
> 
> Let's remember the problem. We cannot display ISO-8859-9 
> encoded string constants of a class, or strings read from a 
> file, in jsp documents in correct encoding.
> Usage of <%@ page content-type does not solve
> Usage of javac -encoding does not solve
> Usage of encodings in file read/writes of java.io routines 
> does not solve
> 
> The problem is with Tomcat, no problems with Resin used in 
> the same environment, OS/JDK.
> 
> Arif ..
> 



RE: Character Encoding Problem

2001-07-02 Thread atumer

> > When reading bytes from file with FileReader the default character encoding
> > is used.
> > I think you must specify your own encoding when reading the file.
> >
> 
> I'll try that. But the same compiled classes and the same jdk version works well 
>with Resin JSP Server and the files. The problem occurs with Tomcat.
> 

Nay, still problems. A lot of ?'s in the visual output. Himpf. The point is, which is 
interesting, if a have an html form with ISO-8859-9 encoded chars, and post it to a 
jsp file, and write the parameters to a system text file, the text is correct!!

The problem seems to occur during displaying strings obtained from inside a class .. 
Any other display problems with other character sets ?? .. Any more idea ?? ..

Let's remember the problem. We cannot display ISO-8859-9 encoded string constants of a 
class, or strings read from a file, in jsp documents in correct encoding.
Usage of <%@ page content-type does not solve
Usage of javac -encoding does not solve
Usage of encodings in file read/writes of java.io routines does not solve

The problem is with Tomcat, no problems with Resin used in the same environment, 
OS/JDK.

Arif ..




RE: Character Encoding Problem

2001-07-02 Thread atumer

> Kimden: Tõnu Põld <[EMAIL PROTECTED]>
> Tarih: 2001/07/02 Mon AM 11:55:51 GMT+03:00
> Kime: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> Konu: RE: ?lgi:RE: Character Encoding Problem
> 
> Just a thought about the  try to place the "<%@page contentType=" in each file you include?

The line is there, in all files :) ..

> 
> 
> When reading bytes from file with FileReader the default character encoding
> is used.
> I think you must specify your own encoding when reading the file.
>

I'll try that. But the same compiled classes and the same jdk version works well with 
Resin JSP Server and the files. The problem occurs with Tomcat.

Thanks,

Arif ..




RE: Character Encoding Problem

2001-07-02 Thread ohamali

Hi,

  Another problem related with the charset type is when I use the following code
 
  strErrorMsg = "a message using ISO-8859-9";
  

and post the form, the receiving jsp file does not print the
strErroMsg variable correctly.

How can this problem be solved?

Thanks,

Oner Necip Hamali.
> 
> Kimden: <[EMAIL PROTECTED]>
> Tarih: 2001/07/02 Mon PM 12:16:03 GMT+03:00
> Kime: [EMAIL PROTECTED]
> Konu: İlgi:RE: Character Encoding Problem
> 
> > Kimden: Tõnu Põld <[EMAIL PROTECTED]>
> > Tarih: 2001/07/02 Mon AM 10:48:10 GMT+03:00
> > Kime: "'[EMAIL PROTECTED]'" <[EMAIL PROTECTED]>
> > Konu: RE: Character Encoding Problem
> > 
> > Hi,
> > 
> > You should compile the java classes with ISO-8859-9 encoding.
> > Look at the -encoding flag of the 'javac' compiler.
> > 
> > In compilation the 8-byte characters in strings are converted to unicode
> > characters.
> > By default the encoding is probably ISO-8859-1.
> > 
> > Regards,
> > Tõnu
> > 
> 
> I tried, a few minutes ago, but the problem remains. I don't
> know the usage of -encoding parameter in detail but,
> I think it is related with the string constants in the 
> source files. The main problem is the encoding when I read
> a text file and show its contents in the jsp with something like,
> 
> <%= FileReader.readLine() %>
> 
> And as I mentioned, when I load a file and write it into another file, the encoding 
>is correct. The result of my string manipulation functions are correct. The single, 
>devastating, problem occurs when I pass a string with characters in ISO-8859-9, from 
>a class to jsp.
> 
> Ah,  
> Himm .. It is a stupid thought but is it something to do with dynamic strings vs. 
>static strings. As far as I know when <%@ include.. is used the file is statically 
>included,
> at compile time, but compilation. Also the string constants in jsp files are compiled, but strings read 
>from files are
> read and generated in execution time .. Himmm .. I think I am going paranoic :) ..
> 
> Thanks, anyway .. Has anyone encountered this kind of problem before ??
> 
> Arif ..
> 
> 




RE: Character Encoding Problem

2001-07-02 Thread Tõnu Põld

Hi,

You should compile the java classes with ISO-8859-9 encoding.
Look at the -encoding flag of the 'javac' compiler.

In compilation the 8-byte characters in strings are converted to unicode
characters.
By default the encoding is probably ISO-8859-1.

Regards,
Tõnu


> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]
> Sent: Monday, July 02, 2001 10:33 AM
> To: [EMAIL PROTECTED]
> Subject: Character Encoding Problem
> 
> 
> Hi,
> 
> I know that this is a popular (!?) problem in tomcat. Dur 
> despite my efforts I could not find any solution. Here it goes:
> 
> We have jsp page in encoding type ISO-8859-9. With the line
> <%@ page contentType = "text/html; charset=ISO-8859-9" %>
> we define the encoding type of the document.
> 
> Result:
> The strings in the jsp file are encoded correctly, like
> - The text in html
> - the text written inside jsp code between <%, %>
> 
> Problem:
> The strings that are recevied from a class, read from a file, 
> inside a constant are not encoded correctly.
> 
> <%= "Pretend to be a ISO8859-9 string" %> is encoded correctly but,
> inside a class let's say we have
> String x = "Pretend to be a ISO8859-9 string";
> then
> <%= myClass.c %> is encoded wrong with many ?'s
> 
> The problem is not in JVM/JDK, I think, as I read a file and 
> write the content into another file, both files are the same 
> with no loss in encoding.
> 
> Also the encoding problem is there when I include files with
>  If I include the file with <%@ include .. the problem is
> not there .. But I really need to use  
> Can anyone solve my problem ?? ..
> 
> Arif Tumer ..
>