Re: Basic Authentication Failed with multibyte username

2010-01-22 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 1/21/2010 6:35 PM, André Warnier wrote:
 Basically, I would tend to say that if the server knows who the clients
 are and vice-versa, you should be free to use any encoding you want,
 with the limitation that what is exchanged on the wire conforms to HTTP
 (because there may be proxies on the way which are not so tolerant).

+1

 What the client is sending is already (in a way) conformant to HTTP,
 because it is base64 encoded and so, on the surface, it does not contain
 non-ascii characters.

+1

 But the problem is that the standard Tomcat code which decodes the Basic
 Authorization header does not work in the way you want, for these
 illegal headers.
 And this code should preferably not be changed in a way which breaks the
 conformance with standard HTTP.
 Because if you do that, then your Tomcat becomes useless for anything
 else than your special client.

+1

Another possibility would be to use something like SecurityFilter, which
allows you to (more easily) write your own authenticator and realm
implementations, and you could write a BasicAuthenticator that reads
these specially-formatted credentials.

I checked the sf source, and it looks like we might have a bug:

   private String decodeBasicAuthorizationString(String authorization) {
  if (authorization == null ||
!authorization.toLowerCase().startsWith(basic )) {
 return null;
  } else {
 authorization = authorization.substring(6).trim();
 // Decode and parse the authorization credentials
 return new String(Base64.decodeBase64(authorization.getBytes()));
  }
   }

That authorization.getBytes() is just asking for trouble, because it
uses the platform default encoding to convert characters to bytes. It
should be using US-ASCII, ISO-8859-1, or something like that.

It also calls the String constructor with a byte array without
specifying the encoding, therefore using the platform default.

Finally, this method is private, which means it cannot be overridden by
a subclass, which would be a nice feature. Maybe I'll fix all that. :)

 Or, you drop the container-managed security, and you use something like
 the SecurityFilter (http://securityfilter.sourceforge.net/), but read
 the homepage carefully first.

Note that the warning about BASIC authentication is waaay outdated: sf
definitely does support BASIC auth.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktZy68ACgkQ9CaO5/Lv0PAdMACfVnkkBJRIo8Gt1LcsegO/JhPD
Tl0AoLcI5QP0XoCa8kgy5zFJnkKBvL6Y
=CBKO
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-22 Thread André Warnier

Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 1/21/2010 6:35 PM, André Warnier wrote:

Basically, I would tend to say that if the server knows who the clients
are and vice-versa, you should be free to use any encoding you want,
with the limitation that what is exchanged on the wire conforms to HTTP
(because there may be proxies on the way which are not so tolerant).


+1


What the client is sending is already (in a way) conformant to HTTP,
because it is base64 encoded and so, on the surface, it does not contain
non-ascii characters.


+1


But the problem is that the standard Tomcat code which decodes the Basic
Authorization header does not work in the way you want, for these
illegal headers.
And this code should preferably not be changed in a way which breaks the
conformance with standard HTTP.
Because if you do that, then your Tomcat becomes useless for anything
else than your special client.


+1

Another possibility would be to use something like SecurityFilter, which
allows you to (more easily) write your own authenticator and realm
implementations, and you could write a BasicAuthenticator that reads
these specially-formatted credentials.

I checked the sf source, and it looks like we might have a bug:

   private String decodeBasicAuthorizationString(String authorization) {
  if (authorization == null ||
!authorization.toLowerCase().startsWith(basic )) {
 return null;
  } else {
 authorization = authorization.substring(6).trim();
 // Decode and parse the authorization credentials
 return new String(Base64.decodeBase64(authorization.getBytes()));
  }
   }

That authorization.getBytes() is just asking for trouble, because it
uses the platform default encoding to convert characters to bytes. It
should be using US-ASCII, ISO-8859-1, or something like that.


-1
I don't think you have a problem there, because what you are decoding 
into bytes there IS bytes (it is base64-encoded).




It also calls the String constructor with a byte array without
specifying the encoding, therefore using the platform default.


+1
That is indeed where you have a problem.  There you SHOULD always decode 
it as US-ASCII (or maybe iso-8859-1, I'm not quite sure what the spec 
says exactly).



Let's say that the spec is clear and says that the header value is 
*TEXT, and that *TEXT is always US-ASCII (or ISO-8859-1) by default.


Let's take it from the browser side first.
If the userid:password is indeed composed only of us-ascii characters, 
then the browser base64-encodes this directly and it is trivial.(*)


But let's say that userid:password is something else than us-ascii.
Another part of the spec says that then, you have to encode it according 
to RFC2047.
My contention is then that the browser should first RFC2047-encode 
userid:password, and then base64-encode the result.


Back on the server side.
The server base64-decodes the authorization token, into an ascii string.
It can do that always, because either the string was ascii to start 
with, or else it was not, but then it has been RFC2047-encoded, yelding 
a result that is ascii.

(like : =?iso-8859-2?B?base64-encoded stuff...?= )

Then the server must do another round of decoding via RFC2047.
That consists of a double decoding again : base64-decode the string 
between the ?? into bytes, and then decode those bytes into Unicode, 
using the charset indicated at the beginning of the rfc2047-encoded 
sequence.



The above, I believe, would be totally consistent with the current RFCs.

But there is a major catch : I don't believe that there is a browser on 
the market today, which properly encodes the userid:password string 
via rfc2047 when it isn't ascii.


And the OP's special client sends UTF-8, but also does not 
rfc2047-encode it.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier

Auth Gábor wrote:

Hi,

I've found a potential bug in the Basic Authentication module. I have users 
and some user's username is contains national characters (encoded in UTF-8). 
The HTTP header based authentication is fails when the username or the 
password contains multibyte characters.


The root of the bug is the Base64 decoder, which decodes the Base64 stream to 
char array: converts each byte to individual char, this decode method corrupts 
the multibyte characters...



Hi.
Before declaring that this is a bug, I suggest that you read the other 
thread entitled mod_jk codepage in header values.
The main point is : according to the HTTP RFCs, a HTTP header value is 
supposed to contain /only/ US-ASCII characters. Some byte values in 
UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking 
and according to the RFC, HTTP headers which would contain them are invalid.

It's a pain, but it's (probably) not a bug.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Mark Thomas
On 21/01/2010 05:54, Auth Gábor wrote:
 Hi,
 
 I've found a potential bug in the Basic Authentication module. I have users 
 and some user's username is contains national characters (encoded in UTF-8). 
 The HTTP header based authentication is fails when the username or the 
 password contains multibyte characters.

That sounds like a bug to me.

 The root of the bug is the Base64 decoder, which decodes the Base64 stream to 
 char array: converts each byte to individual char, this decode method 
 corrupts 
 the multibyte characters...

And that sounds like the root cause.

 It works, because the byte[] to String conversion supports the multibyte 
 conversion and uses the encoding of the JVM.
 
 What do you think about it?

I haven't tested it or looked at the detail of the base 64 decoding but
on the basis it works for you then...

Great! Many thanks. Please create a Bugzilla entry and add your patch to
it. Patches sent to the mailing list are too easy to forget.

Before you do, I have have one improvement suggestion. Using the
platform default encoding to convert bytes to String is something that
itself has caused bugs in the past and I can see it doing so here too.
I'd suggest adding a characterEncoding attribute to the
BasicAuthenticator (like there is for FormAuthenticator). Don't forget
to include documenting this new attribute in your patch.

The tricky question is what should the default be. I see the options as
ISO-8859-1 or UTF-8. I'd use UTF-8 since that will work for most input
including all ISO-8859-1 input.

Thanks again for the patch.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Auth Gábor
Hi,

André Warnier wrote:
 I've found a potential bug in the Basic Authentication module. I have
 users and some user's username is contains national characters (encoded
 in UTF-8). The HTTP header based authentication is fails when the
 username or the password contains multibyte characters.

 The root of the bug is the Base64 decoder, which decodes the Base64
 stream to char array: converts each byte to individual char, this decode
 method corrupts the multibyte characters... 
 Before declaring that this is a bug, I suggest that you read the other
 thread entitled mod_jk codepage in header values.

  I've read that.

 The main point is : according to the HTTP RFCs, a HTTP header value is
 supposed to contain /only/ US-ASCII characters. Some byte values in
 UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
 and according to the RFC, HTTP headers which would contain them are
  invalid. It's a pain, but it's (probably) not a bug.

Hmm... the Basic Authorization header like this:
Authorization: BASIC w7pzZXJfMDA3MjpqZWxzem8xMkFB   
 

Where do you see non US-ASCII character in the header? :)

Gábor Auth

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Mark Thomas
On 21/01/2010 06:12, André Warnier wrote:
 Auth Gábor wrote:
 Hi,

 I've found a potential bug in the Basic Authentication module. I have
 users and some user's username is contains national characters
 (encoded in UTF-8). The HTTP header based authentication is fails when
 the username or the password contains multibyte characters.

 The root of the bug is the Base64 decoder, which decodes the Base64
 stream to char array: converts each byte to individual char, this
 decode method corrupts the multibyte characters...

 Hi.
 Before declaring that this is a bug, I suggest that you read the other
 thread entitled mod_jk codepage in header values.
 The main point is : according to the HTTP RFCs, a HTTP header value is
 supposed to contain /only/ US-ASCII characters. Some byte values in
 UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
 and according to the RFC, HTTP headers which would contain them are
 invalid.
 It's a pain, but it's (probably) not a bug.

In this case I think it is a bug. The authorisation header is base64
encoded so it is automatically compliant with RFC2616.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier

Mark Thomas wrote:

On 21/01/2010 06:12, André Warnier wrote:

Auth Gábor wrote:

Hi,

I've found a potential bug in the Basic Authentication module. I have
users and some user's username is contains national characters
(encoded in UTF-8). The HTTP header based authentication is fails when
the username or the password contains multibyte characters.

The root of the bug is the Base64 decoder, which decodes the Base64
stream to char array: converts each byte to individual char, this
decode method corrupts the multibyte characters...


Hi.
Before declaring that this is a bug, I suggest that you read the other
thread entitled mod_jk codepage in header values.
The main point is : according to the HTTP RFCs, a HTTP header value is
supposed to contain /only/ US-ASCII characters. Some byte values in
UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
and according to the RFC, HTTP headers which would contain them are
invalid.
It's a pain, but it's (probably) not a bug.


In this case I think it is a bug. The authorisation header is base64
encoded so it is automatically compliant with RFC2616.


Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about 
one another, and that the server gets such an Authentication header from 
the browser.

The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a 
string in Java (which is Unicode).  Which encoding should be chosen to 
decode the byte array ?
If you use the default platform JVM encoding, you are making the 
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which 
encoding this string was, before it encoded it using Base64, or did it ?




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Mark Thomas
On 21/01/2010 06:55, André Warnier wrote:
 Mark Thomas wrote:
 The authorisation header is base64
 encoded so it is automatically compliant with RFC2616.

 Yes, it sounds like you're right; my mistake.
 (Also for Gabor, I admit my mistake.)
 
 I agree that the HTTP header itself is correct.
 But there is still somethig which puzzles me in the absolute.
 Suppose that the browser and the server know nothing particular about
 one another, and that the server gets such an Authentication header from
 the browser.
 The Base64 decoding is done, and yields a series of bytes.
 Now this series of bytes have to be interpreted, to be translated into a
 string in Java (which is Unicode).  Which encoding should be chosen to
 decode the byte array ?
 If you use the default platform JVM encoding, you are making the
 assumption that the browser knew what this encoding is, aren't you ?
 On the other hand, the browser sent nothing to indicate in which
 encoding this string was, before it encoded it using Base64, or did it ?

RFC2617 to the rescue...

  basic-credentials = base64-user-pass
  base64-user-pass  = base64 [4] encoding of user-pass,
  except not limited to 76 char/line
  user-pass = userid : password
  userid= *TEXT excluding :
  password  = *TEXT

*TEXT is defined in RFC2616

   TEXT   = any OCTET except CTLs,
but including LWS

and finally

   OCTET  = any 8-bit sequence of data
   CTL= any US-ASCII control character
(octets 0 - 31) and DEL (127)

So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.

Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.

Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier

Mark Thomas wrote:

On 21/01/2010 06:55, André Warnier wrote:

Mark Thomas wrote:

The authorisation header is base64
encoded so it is automatically compliant with RFC2616.


Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about
one another, and that the server gets such an Authentication header from
the browser.
The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a
string in Java (which is Unicode).  Which encoding should be chosen to
decode the byte array ?
If you use the default platform JVM encoding, you are making the
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which
encoding this string was, before it encoded it using Base64, or did it ?


RFC2617 to the rescue...

  basic-credentials = base64-user-pass
  base64-user-pass  = base64 [4] encoding of user-pass,
  except not limited to 76 char/line
  user-pass = userid : password
  userid= *TEXT excluding :
  password  = *TEXT

*TEXT is defined in RFC2616

   TEXT   = any OCTET except CTLs,
but including LWS

and finally

   OCTET  = any 8-bit sequence of data
   CTL= any US-ASCII control character
(octets 0 - 31) and DEL (127)

So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.

Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.

Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.

Mark


Let me be even more pernickety :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT 
portions representing character sets other than US-ASCII.
But then, such header field values MUST be encoded according to the 
rules of RFC 2047.


RFC 2047 in turn, in 2. Syntax of encoded-words , indicates that this 
should be done using the form :

encoded-word = =? charset ? encoding ? encoded-text ?=
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the utf-8 part as the correct name for 
the charset.)


(NDLR: That is something one does find regularly in email headers; but I 
have never seen it used in HTTP headers until now.)


On the other hand, regarding authentication mechanisms, RFC 2616 refers 
to RFC 2617, which itself indicates the following format for an 
authorization header sent by the browser to the server :


Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

When base64-decoded, the above string should look like userid:password.

I did not find in RFC 2617 any specific mention of character set 
encoding, but it itself refers back to RFC 2616 as being the base 
rules. And the base rules in RFC 2616 seem to be that header values are 
US-ASCII unless otherwise indicated.


In other words, my contention is as follows :

- if the userid:password above contain only US-ASCII characters, then 
the above simple form of the header is fine.
- if the userid:password string above contain characters other than 
US-ASCII however, then they should be further encoded, using the rules 
of RFC 2047.

This would mean that you should have something like :

Authorization: Basic =?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=

(or, maybe, the other way around : it is the 
QWxhZGRpbjpvcGVuIHNlc2FtZQ string which, when base64-decoded, should 
yield a new string of the form 
=?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=, which should then be decoded 
once more to give the userid:password string).


Now, I am not sure that if you pass such a HTTP header, encoded as 
above, from Apache to Tomcat, that the Tomcat getHeader() call will 
properly decode it, using the indicated charset.


And I am not sure either that there exists any browser on the market 
that will encode a userid:password string that way.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Auth Gábor
Hi,

Mark Thomas wrote:
OCTET  = any 8-bit sequence of data
CTL= any US-ASCII control character
 (octets 0 - 31) and DEL (127)
 
 So actually, Tomcat is correct in the current treatment of credentials.
 Therefore, not a bug.

Yes, but the UTF-8 encoded text is contains any 8-bit sequence of data except 
control characters, so IMHO the UTF-8 encoded text is TEXT.
 
 Also André's comments regarding ISO-8859-1 were right if considering the
 actual user name and password rather than the header.

Yes, thats right. The default header encoding is ISO-8859-1.

 Supporting other encodings would be a useful enhancement but the default
 will have to be ISO-8859-1 to remain spec compliant. What the browsers
 will do for user names and passwords in other encodings is not defined
 so it will be a case of YMMV.

I've found some information about this issue:
http://stackoverflow.com/questions/702629/utf-8-characters-mangled-in-http-
basic-auth-username 

So... this is the real chaos... :)

By the way, my users are not use HTML browsers, they are using JAX-WS in their 
client program, and the JAX-WS sends authentication data in UTF-8 (like 
Opera), because the default encoding is UTF-8 in the client JVM (and the 
server too).

Gábor Auth

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gábor,

On 1/21/2010 9:16 AM, Auth Gábor wrote:
 Mark Thomas wrote:
OCTET  = any 8-bit sequence of data
CTL= any US-ASCII control character
 (octets 0 - 31) and DEL (127)

 So actually, Tomcat is correct in the current treatment of credentials.
 Therefore, not a bug.
 
 Yes, but the UTF-8 encoded text is contains any 8-bit sequence of data except 
 control characters, so IMHO the UTF-8 encoded text is TEXT.

Sure, UTF-8 encoded text is TEXT, but you may not get the String value
you expect. André is correct in that non-Latin characters appear to be
unsupported by the HTTP Authenticate header.

Now, there /are/ things that can be done to accommodate you. See below.

The patch you posted probably will only work when the platform encoding
is set to UTF-8. Instead, an encoding setting would probably have to be
provided to the BasicAuthenticator to allow the Base64-encoded header
value to use the desired encoding. Actually, the code as it looks right
now does have a bug: the platform default encoding is used to decode
Base-64 decoded bytes in the Authenticate header. Instead, it should
probably be ASCII or maybe ISO-8859-1.

 Also André's comments regarding ISO-8859-1 were right if considering the
 actual user name and password rather than the header.
 
 Yes, thats right. The default header encoding is ISO-8859-1.

It's ASCII, though ISO-8859-1 is backward-compatible (as is UTF-8).

 I've found some information about this issue:
 http://stackoverflow.com/questions/702629/utf-8-characters-mangled-in-http-
 basic-auth-username 

Nice that someone looked at actual behavior of the browsers.

It would be pretty trivial to add a settable charset to the
BasicAuthenticator, and also to allow things like RFC 2047
charset-in-value decoding, though I don't think that's appropriate
because the Bas64 value has already been decoded.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktYuooACgkQ9CaO5/Lv0PAQZQCgoWiesTSQ/aX+oeRmF8Qvv+u3
73oAniYbXKfEIGdnIVyEHpZNgJ82ZjsI
=qPwi
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier

Christopher Schultz wrote:
...


Nice that someone looked at actual behavior of the browsers.


There is an easy way to find out what really happens.
Gábor,
I presume that you have a workstation set for iso-8859-2 (or whichever 
non iso-8859-1 charset is appropriate for Magyar, I forgot), and a 
browser set up similarly.
Could you get one of these add-ons like Fiddler2 or LiveHttpHeaders, and 
arrange to capture what is sent by the browser in its authorization 
header when you enter a user-id/password containing some characters of 
the range above \x9F ?

That should be the base64 encoding of whatever the browser is sending.
Then of course you'll have to find a way to show us the base64-encoded 
form, and the corresponding non-encoded form of ditto (but I think that 
composing and sending your post as UTF-8 should do the trick).


We could probably do much the same with our own charset-challenged 
browsers, but we don't have the easiest keyboards for that.


It is my deep suspicion that the browsers will just take the input as 
iso-latin-x (whatever the workstation/browser is set for), and 
base64-encode it, without bothering to indicate the real charset in any 
way.  But we'll see.


Kösönöm szepen, I think it is...




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier

To get back to the underlying issue :

Auth Gábor wrote:


So... this is the real chaos... :)


Yes.



By the way, my users are not use HTML browsers, they are using JAX-WS in their 
client program, and the JAX-WS sends authentication data in UTF-8 (like 
Opera), because the default encoding is UTF-8 in the client JVM (and the 
server too).




Basically, I would tend to say that if the server knows who the clients 
are and vice-versa, you should be free to use any encoding you want, 
with the limitation that what is exchanged on the wire conforms to HTTP 
(because there may be proxies on the way which are not so tolerant).


What the client is sending is already (in a way) conformant to HTTP, 
because it is base64 encoded and so, on the surface, it does not contain 
non-ascii characters.
And (I presume) you cannot change the code of the client, so it will 
continue to send these invalid headers with a UTF-8 value, base64-encoded.


But the problem is that the standard Tomcat code which decodes the Basic 
Authorization header does not work in the way you want, for these 
illegal headers.
And this code should preferably not be changed in a way which breaks the 
conformance with standard HTTP.
Because if you do that, then your Tomcat becomes useless for anything 
else than your special client.


An additional complication is that, if you want to use the embedded 
container-managed Tomcat authentication mechanisms, then you have to 
do something very early in the cycle, because that authentication takes 
place even before any servlet filter is invoked.


Up to Tomcat 5.5, you would have to do this in a Valve then, which has 
the inconvenient that it is Tomcat-specific.  (I think Tomcat 6 may give 
other options, maybe not Tomcat-specific.)


Or, you drop the container-managed security, and you use something like 
the SecurityFilter (http://securityfilter.sourceforge.net/), but read 
the homepage carefully first.


So, to be pragmatic, I would tend to go in the following direction :
- create a Valve which
- checks the User-Agent. If it does not match your special client, do 
nothing.  If it matches, then

- get the Authorization header. If there is none, do nothing
- else, decode its value properly into a Unicode string
- re-encode this string in a way that fits with standard HTTP.  For 
example, replace each character by a string like {}, where  is 
the hex value of the Unicode codepoint of the character.

(That is always valid us-ascii, but check the maximum length).
- re-encode the result using base64
- replace the Authorization header value with this new string
- in your back-end authentication mechanism (I will suppose it is a 
database of userids/passwords), encode the userids/passwords the same 
way, and make this an alternate key


The embedded Tomcat authentication will then decode the new base64 
string, split it into userid:password, and use them to verify the 
credentials, which will match.


If you do not like a Valve, then use a front-end server like Apache, and 
do the transformation of the header there, before the request is passed 
to Tomcat.
Alternatively then, you could also do the user authentication at the 
Apache level, and just pass the user-id to Tomcat.
(being an Apache/mod_perl guy myself, I find this last option much 
easier, but YMMV).


And all that for a few Ö's and Á's and ß's













Another option is to use a front-end Apache httpd server, which would 
modify the requests as follows :


(I presume that you have a way to identify requests coming from this 
particular client)(User-Agent header e.g.).


Create a filter at the Apache level, which detects your special client.
If it detects it, then it adds an additional header to the request

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org