subject:"Basic Authentication Failed with multibyte username"

Re: [OT] Basic Authentication Failed with multibyte username

2010-01-25 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 1/24/2010 9:22 AM, André Warnier wrote:
> Christopher Schultz wrote:
>
>> Maybe all character sets have bytes 0-127 the same as US-ASCII, but I
>> don't know about some of those I never see myself: Shift-JS and all
>> those Asian encodings, etc. It would be better to be explicit.
> 
> With respect, I think you are mistaken here.
> Base64 encoding is essentially a method to encode pairs of bytes into
> triplets of bytes, in such a way that no byte in the resulting triplet
> has the high bit set. (Use "octet" instead of "byte" if it is more
> comfortable).

It's more than that: it uses an explicit set of characters in the
US-ASCII encoding as display. If you were to Base64 encode a string and
then transmit it as EBCDIC, it would look the same to human eyes but
have different underlying byte values (octets, if you prefer).

> Basically, it was created in order to allow 8-bit character data to be
> sent over an 7-bit channel.
> So there is no character set implication at all in either encoding or
> decoding :
> - to encode, you take each group of 2 bytes, and encode it into a group
> of 3 bytes
> - to decode, you take each group of 3 bytes, and decode it into a group
> of 2 bytes.

Actually, I was wrong above: it's not a US-ASCII encoding. Instead, the
byte values are an index into a string of characters, as described in
the reference-less Wikipedia article:

"
The buffer is then used, six bits at a time, most significant first, as
indices into the string:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/", and
the indicated character is output.
"

So, in DBCDEC, the human reader would be confused :(

> So maybe the "authorization.getBytes()" above is wrong intellectually
> (if it implies that "authorization" is some kind of string expressed in
> a character set). The Base64-encoded "string" should really be read as
> bytes, because that is what it is.

Fair enough, though the above string fits nicely into US-ASCII which,
coincidentally, is the official encoding of HTTP headers :)

> The next step after the base64-decoding is where it matters

I agree, and here's where your arguments fall on deaf ears: each client
does whatever it wants with regard to encoding of this data. The major
web browsers don't even agree on what to do. Since the OP has his own
client (right? or have I gotten confused with one or two other threads
this week), he can do whatever he wants as long as the authentication
mechanism agrees with the client.

> But is is impossible to know which character set the browser used,
> just by examining that series of bytes.

Almost certainly true, although a tight client/server relationship could
include a scheme to indicate the encoding in the value itself. Something
like RFC2047, for instance.

> So there are only 2 choices possible :
> 
> 1) the rules specify that the base64-decoded "userid:password"
> string is always encoded using one specific charset.  In the case of
> HTTP, this would have to be iso-8859-1.
> (And in that case, HTTP Basic Authentication does not allow for
> non-iso-8859-1 userid's and passwords, and too bad for 80% of the world
> population)

I disagree: the spec is unclear about the encoding used before the
Base64 encoding. This is the source of the problem because clients have
decided to take it upon themselves to decide what is best (UTF-8, page
encoding, random encoding, no encoding, etc.).

> 2) the rules specify something like :
> - if the base64-decoded authorization token does not start with the
> iso-8859-1 characters "=?", then it is interpreted as iso-8859-1 (the
> default)
> - if it starts with "=?" and ends with "?=", then it is interpreted as a
> rfc2047-encoded token, to be decoded using the charset indicated after
> the leading "=?".
> (And user-id's starting with "=?" are forbidden, but that's not a very
> likely case nor a big limitation).

That would be a great implementation, but nobody appears to have done
it. If the OP wants to use this strategy, he'll have to hack Tomcat's
authenticator to accept this type of encoding... or use something like
Securityfilter, again, with a patch to accept this type of encoding.

> So back to Gábor's original problem :
> 
> His specific "client" is not a browser, and it allows a user:password
> string to contain non-iso-8859-1 characters, and it encodes it in UTF-8,
> prior to encoding it with base64.

Fortunately, he has control over the client, which is great.

> At the Tomcat level :
> 
> If Gábor modifies the Tomcat container-managed Basic Authentication
> code, so that it will first base64-decode the token, then convert it to
> a string using UTF-8 encoding, that will work for requests from this
> special client.  But it will break with any other client.

+1

> If Gábor can distinguish requests from this special client, from
> requests from standard clients, then he could make the UTF-8 decoding
> conditional on where the request comes from.

+1

Re: [OT] Basic Authentication Failed with multibyte username

2010-01-24 Thread André Warnier


Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

(Marking OT because, well... just because).

On 1/22/2010 2:59 PM, Warnier wrote:

Christopher Schultz wrote:

That "authorization.getBytes()" is just asking for trouble, because it
uses the platform default encoding to convert characters to bytes. It
should be using US-ASCII, ISO-8859-1, or something like that.

-1
I don't think you have a problem there, because what you are decoding
into bytes there IS bytes (it is base64-encoded).


Maybe all character sets have bytes 0-127 the same as US-ASCII, but I
don't know about some of those I never see myself: Shift-JS and all
those Asian encodings, etc. It would be better to be explicit.


With respect, I think you are mistaken here.
Base64 encoding is essentially a method to encode pairs of bytes into
triplets of bytes, in such a way that no byte in the resulting triplet
has the high bit set. (Use "octet" instead of "byte" if it is more
comfortable).
Basically, it was created in order to allow 8-bit character data to be
sent over an 7-bit channel.
So there is no character set implication at all in either encoding or
decoding :
- to encode, you take each group of 2 bytes, and encode it into a group
of 3 bytes
- to decode, you take each group of 3 bytes, and decode it into a group
of 2 bytes.

So maybe the "authorization.getBytes()" above is wrong intellectually
(if it implies that "authorization" is some kind of string expressed in
a character set). The Base64-encoded "string" should really be read as 
bytes, because that is what it is.


The next step after the base64-decoding is where it matters : now we 
have an array of bytes with values 0-255, and we have to interpret it 
into a "userid:password" string which /might/ be us-ascii or iso-8859-1, 
but might also be something else.

But is is impossible to know which character set the browser used,
just by examining that series of bytes.  Inherently, nothing
distinguishes a series of bytes from another, and they could just as
well represent an iso-8859-1 string, as an iso-8859-2,3,4,5.. or a UTF-8
string.
You can examine a series of bytes and tell whether it could
be a valid UTF-8 string (because some byte sequences are not possible
under UTF-8).  But even if it could be valid UTF-8, does not mean that
it is UTF-8; and distinguishing different iso-8859-x byte sequences from 
one another is totally impossible.


Example :
We receive a base64 authorization token, which once it is base64-decoded 
, results in the following series of octets shown in hex :

73 63 68 75 6C 74 7A 3A C3 A9 74 C3 A9
If we decode this as being utf-8, we get the string
schultz:été
and we would thus suppose that this userid is "shultz" and his password
is "été".
But if we decide that the origin character set was iso-8859-1, then we
would decode it into
schultz:Ã©tÃ©
and the user would still be "schultz", but his password would be "Ã©tÃ©"
(which would be an equally-valid password).
There is no way to decide in the absolute which decoding is "right",
in the absence of more information.


So there are only 2 choices possible :

1) the rules specify that the base64-decoded "userid:password"
string is always encoded using one specific charset.  In the case of
HTTP, this would have to be iso-8859-1.
(And in that case, HTTP Basic Authentication does not allow for
non-iso-8859-1 userid's and passwords, and too bad for 80% of the world 
population)


or

2) the rules specify something like :
- if the base64-decoded authorization token does not start with the
iso-8859-1 characters "=?", then it is interpreted as iso-8859-1 (the 
default)
- if it starts with "=?" and ends with "?=", then it is interpreted as a 
rfc2047-encoded token, to be decoded using the charset indicated after 
the leading "=?".
(And user-id's starting with "=?" are forbidden, but that's not a very 
likely case nor a big limitation).


So back to Gábor's original problem :

His specific "client" is not a browser, and it allows a user:password 
string to contain non-iso-8859-1 characters, and it encodes it in UTF-8, 
prior to encoding it with base64.


At the Tomcat level :

If Gábor modifies the Tomcat container-managed Basic Authentication 
code, so that it will first base64-decode the token, then convert it to 
a string using UTF-8 encoding, that will work for requests from this 
special client.  But it will break with any other client.


If Gábor can distinguish requests from this special client, from 
requests from standard clients, then he could make the UTF-8 decoding 
conditional on where the request comes from.
If this is done in the container-based Basic Authentication code, then 
it would still result in a non-standard Tomcat, but at least it would 
not break with normal clients.


If Gábor drops the container-based authentication, and uses a servlet 
filter like SecurityFilter (modified the same way), then that would have 
the advantage of keeping a standard Tomcat, and also of working

Re: [OT] Basic Authentication Failed with multibyte username

2010-01-22 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

(Marking OT because, well... just because).

On 1/22/2010 2:59 PM, Warnier wrote:
> Christopher Schultz wrote:
>> That "authorization.getBytes()" is just asking for trouble, because it
>> uses the platform default encoding to convert characters to bytes. It
>> should be using US-ASCII, ISO-8859-1, or something like that.
> 
> -1
> I don't think you have a problem there, because what you are decoding
> into bytes there IS bytes (it is base64-encoded).

Maybe all character sets have bytes 0-127 the same as US-ASCII, but I
don't know about some of those I never see myself: Shift-JS and all
those Asian encodings, etc. It would be better to be explicit.

>> It also calls the String constructor with a byte array without
>> specifying the encoding, therefore using the platform default.
> 
> +1
> That is indeed where you have a problem.  There you SHOULD always decode
> it as US-ASCII (or maybe iso-8859-1, I'm not quite sure what the spec
> says exactly).

- From my reading, the spec is silent but one can draw the conclusion that
US-ASCII is basically all that is supported. I should all the capability
of configuring this encoding to override the (soon to be) default of
US-ASCII: if the user knows the client will use UTF-8, they should be
allowed to force that encoding to be used.

> Let's say that the spec is clear and says that the header value is
> *TEXT, and that *TEXT is always US-ASCII (or ISO-8859-1) by default.
> 
> Let's take it from the browser side first.
> If the "userid:password" is indeed composed only of us-ascii characters,
> then the browser base64-encodes this directly and it is trivial.(*)
> 
> But let's say that "userid:password" is something else than us-ascii.
> Another part of the spec says that then, you have to encode it according
> to RFC2047.

No, I don't think this is correct: the spec says that the HTTP header
values must be in US-ASCII, and may be encoded using RFC2047 in order to
achieve that. Since Base64 encoding always results in a
US-ASCII-compatible value, there is no reason to involve RFC2047.

> My contention is then that the browser should first RFC2047-encode
> "userid:password", and then base64-encode the result.

While that sounds like a good idea, it's almost certainly never done
that way.

> Back on the server side.
> The server base64-decodes the authorization token, into an ascii string.
> It can do that always, because either the string was ascii to start
> with, or else it was not, but then it has been RFC2047-encoded, yelding
> a result that is ascii.
> (like : =?iso-8859-2?B?base64-encoded stuff...?= )

This would be a decent configurable setting for a BASIC authenticator...
something like "allow-rfc2047" or whatever. What about those people who
really want to have a username like "=?whatever" and a password like
"whatever?="? They can't login? :)

> The above, I believe, would be totally consistent with the current RFCs.

Yes, but for whatever reason, nobody ever fully implements the RFCs :)
There are standards and there are practices. In this case, I think
practices outweigh the standards :)

> But there is a major catch : I don't believe that there is a browser on
> the market today, which "properly" encodes the "userid:password" string
> via rfc2047 when it isn't ascii.

Nor would it be appropriate to do so, because base64 encoding is
/always/ used and will therefore /always/ result in a valid HTTP
Authenticate header value.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktaFaQACgkQ9CaO5/Lv0PBMcACgpSL6QcBn6C2thQash4W/LIhg
5VgAn2hmTLmwdgk1HkhDxOshDDyZkBr0
=xBQs
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-22 Thread André Warnier


Christopher Schultz wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 1/21/2010 6:35 PM, André Warnier wrote:

Basically, I would tend to say that if the server knows who the clients
are and vice-versa, you should be free to use any encoding you want,
with the limitation that what is exchanged on the wire conforms to HTTP
(because there may be proxies on the way which are not so tolerant).


+1


What the client is sending is already (in a way) conformant to HTTP,
because it is base64 encoded and so, on the surface, it does not contain
non-ascii characters.


+1


But the problem is that the standard Tomcat code which decodes the Basic
Authorization header does not work in the way you want, for these
illegal headers.
And this code should preferably not be changed in a way which breaks the
conformance with standard HTTP.
Because if you do that, then your Tomcat becomes useless for anything
else than your special client.


+1

Another possibility would be to use something like SecurityFilter, which
allows you to (more easily) write your own authenticator and realm
implementations, and you could write a BasicAuthenticator that reads
these specially-formatted credentials.

I checked the sf source, and it looks like we might have a bug:

   private String decodeBasicAuthorizationString(String authorization) {
  if (authorization == null ||
!authorization.toLowerCase().startsWith("basic ")) {
 return null;
  } else {
 authorization = authorization.substring(6).trim();
 // Decode and parse the authorization credentials
 return new String(Base64.decodeBase64(authorization.getBytes()));
  }
   }

That "authorization.getBytes()" is just asking for trouble, because it
uses the platform default encoding to convert characters to bytes. It
should be using US-ASCII, ISO-8859-1, or something like that.


-1
I don't think you have a problem there, because what you are decoding 
into bytes there IS bytes (it is base64-encoded).




It also calls the String constructor with a byte array without
specifying the encoding, therefore using the platform default.


+1
That is indeed where you have a problem.  There you SHOULD always decode 
it as US-ASCII (or maybe iso-8859-1, I'm not quite sure what the spec 
says exactly).



Let's say that the spec is clear and says that the header value is 
*TEXT, and that *TEXT is always US-ASCII (or ISO-8859-1) by default.


Let's take it from the browser side first.
If the "userid:password" is indeed composed only of us-ascii characters, 
then the browser base64-encodes this directly and it is trivial.(*)


But let's say that "userid:password" is something else than us-ascii.
Another part of the spec says that then, you have to encode it according 
to RFC2047.
My contention is then that the browser should first RFC2047-encode 
"userid:password", and then base64-encode the result.


Back on the server side.
The server base64-decodes the authorization token, into an ascii string.
It can do that always, because either the string was ascii to start 
with, or else it was not, but then it has been RFC2047-encoded, yelding 
a result that is ascii.

(like : =?iso-8859-2?B?base64-encoded stuff...?= )

Then the server must do another round of decoding via RFC2047.
That consists of a double decoding again : base64-decode the string 
between the ?? into bytes, and then decode those bytes into Unicode, 
using the charset indicated at the beginning of the rfc2047-encoded 
sequence.



The above, I believe, would be totally consistent with the current RFCs.

But there is a major catch : I don't believe that there is a browser on 
the market today, which "properly" encodes the "userid:password" string 
via rfc2047 when it isn't ascii.


And the OP's special client sends UTF-8, but also does not 
rfc2047-encode it.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-22 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

André,

On 1/21/2010 6:35 PM, André Warnier wrote:
> Basically, I would tend to say that if the server knows who the clients
> are and vice-versa, you should be free to use any encoding you want,
> with the limitation that what is exchanged on the wire conforms to HTTP
> (because there may be proxies on the way which are not so tolerant).

+1

> What the client is sending is already (in a way) conformant to HTTP,
> because it is base64 encoded and so, on the surface, it does not contain
> non-ascii characters.

+1

> But the problem is that the standard Tomcat code which decodes the Basic
> Authorization header does not work in the way you want, for these
> illegal headers.
> And this code should preferably not be changed in a way which breaks the
> conformance with standard HTTP.
> Because if you do that, then your Tomcat becomes useless for anything
> else than your special client.

+1

Another possibility would be to use something like SecurityFilter, which
allows you to (more easily) write your own authenticator and realm
implementations, and you could write a BasicAuthenticator that reads
these specially-formatted credentials.

I checked the sf source, and it looks like we might have a bug:

   private String decodeBasicAuthorizationString(String authorization) {
  if (authorization == null ||
!authorization.toLowerCase().startsWith("basic ")) {
 return null;
  } else {
 authorization = authorization.substring(6).trim();
 // Decode and parse the authorization credentials
 return new String(Base64.decodeBase64(authorization.getBytes()));
  }
   }

That "authorization.getBytes()" is just asking for trouble, because it
uses the platform default encoding to convert characters to bytes. It
should be using US-ASCII, ISO-8859-1, or something like that.

It also calls the String constructor with a byte array without
specifying the encoding, therefore using the platform default.

Finally, this method is private, which means it cannot be overridden by
a subclass, which would be a nice feature. Maybe I'll fix all that. :)

> Or, you drop the container-managed security, and you use something like
> the SecurityFilter (http://securityfilter.sourceforge.net/), but read
> the homepage carefully first.

Note that the warning about BASIC authentication is waaay outdated: sf
definitely does support BASIC auth.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktZy68ACgkQ9CaO5/Lv0PAdMACfVnkkBJRIo8Gt1LcsegO/JhPD
Tl0AoLcI5QP0XoCa8kgy5zFJnkKBvL6Y
=CBKO
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier


To get back to the underlying issue :

Auth Gábor wrote:


So... this is the real chaos... :)


Yes.



By the way, my users are not use HTML browsers, they are using JAX-WS in their 
client program, and the JAX-WS sends authentication data in UTF-8 (like 
Opera), because the default encoding is UTF-8 in the client JVM (and the 
server too).




Basically, I would tend to say that if the server knows who the clients 
are and vice-versa, you should be free to use any encoding you want, 
with the limitation that what is exchanged on the wire conforms to HTTP 
(because there may be proxies on the way which are not so tolerant).


What the client is sending is already (in a way) conformant to HTTP, 
because it is base64 encoded and so, on the surface, it does not contain 
non-ascii characters.
And (I presume) you cannot change the code of the client, so it will 
continue to send these "invalid" headers with a UTF-8 value, base64-encoded.


But the problem is that the standard Tomcat code which decodes the Basic 
Authorization header does not work in the way you want, for these 
illegal headers.
And this code should preferably not be changed in a way which breaks the 
conformance with standard HTTP.
Because if you do that, then your Tomcat becomes useless for anything 
else than your special client.


An additional complication is that, if you want to use the embedded 
"container-managed" Tomcat authentication mechanisms, then you have to 
do something very early in the cycle, because that authentication takes 
place even before any servlet filter is invoked.


Up to Tomcat 5.5, you would have to do this in a Valve then, which has 
the inconvenient that it is Tomcat-specific.  (I think Tomcat 6 may give 
other options, maybe not Tomcat-specific.)


Or, you drop the container-managed security, and you use something like 
the SecurityFilter (http://securityfilter.sourceforge.net/), but read 
the homepage carefully first.


So, to be pragmatic, I would tend to go in the following direction :
- create a Valve which
- checks the User-Agent. If it does not match your special client, do 
nothing.  If it matches, then

- get the Authorization header. If there is none, do nothing
- else, decode its value properly into a Unicode string
- re-encode this string in a way that fits with standard HTTP.  For 
example, replace each character by a string like {}, where  is 
the hex value of the Unicode codepoint of the character.

(That is always valid us-ascii, but check the maximum length).
- re-encode the result using base64
- replace the Authorization header value with this new string
- in your back-end authentication mechanism (I will suppose it is a 
database of userids/passwords), encode the userids/passwords the same 
way, and make this an alternate key


The embedded Tomcat authentication will then decode the new base64 
string, split it into userid:password, and use them to verify the 
credentials, which will match.


If you do not like a Valve, then use a front-end server like Apache, and 
do the transformation of the header there, before the request is passed 
to Tomcat.
Alternatively then, you could also do the user authentication at the 
Apache level, and just pass the user-id to Tomcat.
(being an Apache/mod_perl guy myself, I find this last option much 
easier, but YMMV).


And all that for a few Ö's and Á's and ß's













Another option is to use a front-end Apache httpd server, which would 
modify the requests as follows :


(I presume that you have a way to identify requests coming from this 
particular client)(User-Agent header e.g.).


Create a filter at the Apache level, which detects your special client.
If it detects it, then it adds an additional header to the request

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier


Christopher Schultz wrote:
...


Nice that someone looked at actual behavior of the browsers.


There is an easy way to find out what really happens.
Gábor,
I presume that you have a workstation set for iso-8859-2 (or whichever 
non iso-8859-1 charset is appropriate for Magyar, I forgot), and a 
browser set up similarly.
Could you get one of these add-ons like Fiddler2 or LiveHttpHeaders, and 
arrange to capture what is sent by the browser in its authorization 
header when you enter a user-id/password containing some characters of 
the range above \x9F ?

That should be the base64 encoding of whatever the browser is sending.
Then of course you'll have to find a way to show us the base64-encoded 
form, and the corresponding non-encoded form of ditto (but I think that 
composing and sending your post as UTF-8 should do the trick).


We could probably do much the same with our own charset-challenged 
browsers, but we don't have the easiest keyboards for that.


It is my deep suspicion that the browsers will just take the input as 
iso-latin-x (whatever the workstation/browser is set for), and 
base64-encode it, without bothering to indicate the real charset in any 
way.  But we'll see.


Kösönöm szepen, I think it is...




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Christopher Schultz

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Gábor,

On 1/21/2010 9:16 AM, Auth Gábor wrote:
> Mark Thomas wrote:
>>OCTET  = 
>>CTL= > (octets 0 - 31) and DEL (127)>
>>
>> So actually, Tomcat is correct in the current treatment of credentials.
>> Therefore, not a bug.
> 
> Yes, but the UTF-8 encoded text is contains any 8-bit sequence of data except 
> control characters, so IMHO the UTF-8 encoded text is TEXT.

Sure, UTF-8 encoded text is TEXT, but you may not get the String value
you expect. André is correct in that non-Latin characters appear to be
unsupported by the HTTP Authenticate header.

Now, there /are/ things that can be done to accommodate you. See below.

The patch you posted probably will only work when the platform encoding
is set to UTF-8. Instead, an encoding setting would probably have to be
provided to the BasicAuthenticator to allow the Base64-encoded header
value to use the desired encoding. Actually, the code as it looks right
now does have a bug: the platform default encoding is used to decode
Base-64 decoded bytes in the Authenticate header. Instead, it should
probably be ASCII or maybe ISO-8859-1.

>> Also André's comments regarding ISO-8859-1 were right if considering the
>> actual user name and password rather than the header.
> 
> Yes, thats right. The default header encoding is ISO-8859-1.

It's ASCII, though ISO-8859-1 is backward-compatible (as is UTF-8).

> I've found some information about this issue:
> http://stackoverflow.com/questions/702629/utf-8-characters-mangled-in-http-
> basic-auth-username 

Nice that someone looked at actual behavior of the browsers.

It would be pretty trivial to add a settable charset to the
BasicAuthenticator, and also to allow things like RFC 2047
charset-in-value decoding, though I don't think that's appropriate
because the Bas64 value has already been decoded.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAktYuooACgkQ9CaO5/Lv0PAQZQCgoWiesTSQ/aX+oeRmF8Qvv+u3
73oAniYbXKfEIGdnIVyEHpZNgJ82ZjsI
=qPwi
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Auth Gábor

Hi,

Mark Thomas wrote:
>OCTET  = 
>CTL=  (octets 0 - 31) and DEL (127)>
> 
> So actually, Tomcat is correct in the current treatment of credentials.
> Therefore, not a bug.

Yes, but the UTF-8 encoded text is contains any 8-bit sequence of data except 
control characters, so IMHO the UTF-8 encoded text is TEXT.

> Also André's comments regarding ISO-8859-1 were right if considering the
> actual user name and password rather than the header.

Yes, thats right. The default header encoding is ISO-8859-1.

> Supporting other encodings would be a useful enhancement but the default
> will have to be ISO-8859-1 to remain spec compliant. What the browsers
> will do for user names and passwords in other encodings is not defined
> so it will be a case of YMMV.

I've found some information about this issue:
http://stackoverflow.com/questions/702629/utf-8-characters-mangled-in-http-
basic-auth-username 

So... this is the real chaos... :)

By the way, my users are not use HTML browsers, they are using JAX-WS in their 
client program, and the JAX-WS sends authentication data in UTF-8 (like 
Opera), because the default encoding is UTF-8 in the client JVM (and the 
server too).

Gábor Auth

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier


Mark Thomas wrote:

On 21/01/2010 06:55, André Warnier wrote:

Mark Thomas wrote:

The authorisation header is base64
encoded so it is automatically compliant with RFC2616.


Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about
one another, and that the server gets such an Authentication header from
the browser.
The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a
string in Java (which is Unicode).  Which encoding should be chosen to
decode the byte array ?
If you use the default platform JVM encoding, you are making the
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which
encoding this string was, before it encoded it using Base64, or did it ?


RFC2617 to the rescue...

  basic-credentials = base64-user-pass
  base64-user-pass  = 
  user-pass = userid ":" password
  userid= *
  password  = *TEXT

*TEXT is defined in RFC2616

   TEXT   = 

and finally

   OCTET  = 
   CTL= 

So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.

Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.

Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.

Mark


Let me be even more pernickety :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT 
portions representing character sets other than US-ASCII.
But then, such header field values MUST be encoded according to the 
rules of RFC 2047.


RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this 
should be done using the form :

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for 
the charset.)


(NDLR: That is something one does find regularly in email headers; but I 
have never seen it used in HTTP headers until now.)


On the other hand, regarding authentication mechanisms, RFC 2616 refers 
to RFC 2617, which itself indicates the following format for an 
authorization header sent by the browser to the server :


Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

When base64-decoded, the above string should look like "userid:password".

I did not find in RFC 2617 any specific mention of character set 
encoding, but it itself refers back to RFC 2616 as being the "base 
rules". And the base rules in RFC 2616 seem to be that header values are 
US-ASCII unless otherwise indicated.


In other words, my contention is as follows :

- if the "userid:password" above contain only US-ASCII characters, then 
the above simple form of the header is fine.
- if the "userid:password" string above contain characters other than 
US-ASCII however, then they should be further encoded, using the rules 
of RFC 2047.

This would mean that you should have something like :

Authorization: Basic =?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=

(or, maybe, the other way around : it is the 
"QWxhZGRpbjpvcGVuIHNlc2FtZQ" string which, when base64-decoded, should 
yield a new string of the form 
"=?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=", which should then be decoded 
once more to give the "userid:password" string).


Now, I am not sure that if you pass such a HTTP header, encoded as 
above, from Apache to Tomcat, that the Tomcat getHeader() call will 
properly decode it, using the indicated charset.


And I am not sure either that there exists any browser on the market 
that will encode a userid:password string that way.



-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Mark Thomas

On 21/01/2010 06:55, André Warnier wrote:
> Mark Thomas wrote:
>> The authorisation header is base64
>> encoded so it is automatically compliant with RFC2616.
>>
> Yes, it sounds like you're right; my mistake.
> (Also for Gabor, I admit my mistake.)
> 
> I agree that the HTTP header itself is correct.
> But there is still somethig which puzzles me in the absolute.
> Suppose that the browser and the server know nothing particular about
> one another, and that the server gets such an Authentication header from
> the browser.
> The Base64 decoding is done, and yields a series of bytes.
> Now this series of bytes have to be interpreted, to be translated into a
> string in Java (which is Unicode).  Which encoding should be chosen to
> decode the byte array ?
> If you use the default platform JVM encoding, you are making the
> assumption that the browser knew what this encoding is, aren't you ?
> On the other hand, the browser sent nothing to indicate in which
> encoding this string was, before it encoded it using Base64, or did it ?

RFC2617 to the rescue...

  basic-credentials = base64-user-pass
  base64-user-pass  = 
  user-pass = userid ":" password
  userid= *
  password  = *TEXT

*TEXT is defined in RFC2616

   TEXT   = 

and finally

   OCTET  = 
   CTL= 

So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.

Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.

Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier


Mark Thomas wrote:

On 21/01/2010 06:12, André Warnier wrote:

Auth Gábor wrote:

Hi,

I've found a potential bug in the Basic Authentication module. I have
users and some user's username is contains national characters
(encoded in UTF-8). The HTTP header based authentication is fails when
the username or the password contains multibyte characters.

The root of the bug is the Base64 decoder, which decodes the Base64
stream to char array: converts each byte to individual char, this
decode method corrupts the multibyte characters...


Hi.
Before declaring that this is a bug, I suggest that you read the other
thread entitled "mod_jk codepage in header values".
The main point is : according to the HTTP RFCs, a HTTP header value is
supposed to contain /only/ US-ASCII characters. Some byte values in
UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
and according to the RFC, HTTP headers which would contain them are
invalid.
It's a pain, but it's (probably) not a bug.


In this case I think it is a bug. The authorisation header is base64
encoded so it is automatically compliant with RFC2616.


Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about 
one another, and that the server gets such an Authentication header from 
the browser.

The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a 
string in Java (which is Unicode).  Which encoding should be chosen to 
decode the byte array ?
If you use the default platform JVM encoding, you are making the 
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which 
encoding this string was, before it encoded it using Base64, or did it ?




-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Mark Thomas

On 21/01/2010 06:12, André Warnier wrote:
> Auth Gábor wrote:
>> Hi,
>>
>> I've found a potential bug in the Basic Authentication module. I have
>> users and some user's username is contains national characters
>> (encoded in UTF-8). The HTTP header based authentication is fails when
>> the username or the password contains multibyte characters.
>>
>> The root of the bug is the Base64 decoder, which decodes the Base64
>> stream to char array: converts each byte to individual char, this
>> decode method corrupts the multibyte characters...
>>
> Hi.
> Before declaring that this is a bug, I suggest that you read the other
> thread entitled "mod_jk codepage in header values".
> The main point is : according to the HTTP RFCs, a HTTP header value is
> supposed to contain /only/ US-ASCII characters. Some byte values in
> UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
> and according to the RFC, HTTP headers which would contain them are
> invalid.
> It's a pain, but it's (probably) not a bug.

In this case I think it is a bug. The authorisation header is base64
encoded so it is automatically compliant with RFC2616.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Auth Gábor

Hi,

André Warnier wrote:
>> I've found a potential bug in the Basic Authentication module. I have
>> users and some user's username is contains national characters (encoded
>> in UTF-8). The HTTP header based authentication is fails when the
>> username or the password contains multibyte characters.
>>
>> The root of the bug is the Base64 decoder, which decodes the Base64
>> stream to char array: converts each byte to individual char, this decode
>> method corrupts the multibyte characters... 
> Before declaring that this is a bug, I suggest that you read the other
> thread entitled "mod_jk codepage in header values".

  I've read that.

> The main point is : according to the HTTP RFCs, a HTTP header value is
> supposed to contain /only/ US-ASCII characters. Some byte values in
> UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking
> and according to the RFC, HTTP headers which would contain them are
>  invalid. It's a pain, but it's (probably) not a bug.

Hmm... the Basic Authorization header like this:
Authorization: BASIC w7pzZXJfMDA3MjpqZWxzem8xMkFB   
 

Where do you see non US-ASCII character in the header? :)

Gábor Auth

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread Mark Thomas

On 21/01/2010 05:54, Auth Gábor wrote:
> Hi,
> 
> I've found a potential bug in the Basic Authentication module. I have users 
> and some user's username is contains national characters (encoded in UTF-8). 
> The HTTP header based authentication is fails when the username or the 
> password contains multibyte characters.

That sounds like a bug to me.

> The root of the bug is the Base64 decoder, which decodes the Base64 stream to 
> char array: converts each byte to individual char, this decode method 
> corrupts 
> the multibyte characters...

And that sounds like the root cause.

> It works, because the byte[] to String conversion supports the multibyte 
> conversion and uses the encoding of the JVM.
> 
> What do you think about it?

I haven't tested it or looked at the detail of the base 64 decoding but
on the basis it works for you then...

Great! Many thanks. Please create a Bugzilla entry and add your patch to
it. Patches sent to the mailing list are too easy to forget.

Before you do, I have have one improvement suggestion. Using the
platform default encoding to convert bytes to String is something that
itself has caused bugs in the past and I can see it doing so here too.
I'd suggest adding a characterEncoding attribute to the
BasicAuthenticator (like there is for FormAuthenticator). Don't forget
to include documenting this new attribute in your patch.

The tricky question is what should the default be. I see the options as
ISO-8859-1 or UTF-8. I'd use UTF-8 since that will work for most input
including all ISO-8859-1 input.

Thanks again for the patch.

Mark

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

2010-01-21 Thread André Warnier


Auth Gábor wrote:

Hi,

I've found a potential bug in the Basic Authentication module. I have users 
and some user's username is contains national characters (encoded in UTF-8). 
The HTTP header based authentication is fails when the username or the 
password contains multibyte characters.


The root of the bug is the Base64 decoder, which decodes the Base64 stream to 
char array: converts each byte to individual char, this decode method corrupts 
the multibyte characters...



Hi.
Before declaring that this is a bug, I suggest that you read the other 
thread entitled "mod_jk codepage in header values".
The main point is : according to the HTTP RFCs, a HTTP header value is 
supposed to contain /only/ US-ASCII characters. Some byte values in 
UTF-8 encoding are /not/ valid US-ASCII characters, so strictly speaking 
and according to the RFC, HTTP headers which would contain them are invalid.

It's a pain, but it's (probably) not a bug.


-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Basic Authentication Failed with multibyte username

2010-01-21 Thread Auth Gábor

Hi,

I've found a potential bug in the Basic Authentication module. I have users 
and some user's username is contains national characters (encoded in UTF-8). 
The HTTP header based authentication is fails when the username or the 
password contains multibyte characters.

The root of the bug is the Base64 decoder, which decodes the Base64 stream to 
char array: converts each byte to individual char, this decode method corrupts 
the multibyte characters...

Here is the patch:
===
Index: java/org/apache/catalina/util/Base64.java
===
--- java/org/apache/catalina/util/Base64.java   (revision 901368)
+++ java/org/apache/catalina/util/Base64.java   (working copy)
@@ -283,5 +283,84 @@
 }
 }

+/**
+ * Decodes Base64 data into octects
+ *
+ * @param base64DataBC Byte array containing Base64 data
+ * @param decodedDataBC The decoded data bytes
+ */
+public static void decode( ByteChunk base64DataBC, ByteChunk 
decodedDataBC)
+{
+int start = base64DataBC.getStart();
+int end = base64DataBC.getEnd();
+byte[] base64Data = base64DataBC.getBuffer();
+
+decodedDataBC.recycle();
+
+// handle the edge case, so we don't have to worry about it later
+if(end - start == 0) { return; }

+int  numberQuadruple= (end - start)/FOURBYTE;
+byte b1=0,b2=0,b3=0, b4=0, marker0=0, marker1=0;
+
+// Throw away anything not in base64Data
+
+int encodedIndex = 0;
+int dataIndex = start;
+byte[] decodedData = null;
+
+{
+// this sizes the output array properly - rlw
+int lastData = end - start;
+// ignore the '=' padding
+while (base64Data[start+lastData-1] == PAD)
+{
+if (--lastData == 0)
+{
+return;
+}
+}
+decodedDataBC.allocate(lastData - numberQuadruple, -1);
+decodedDataBC.setEnd(lastData - numberQuadruple);
+decodedData = decodedDataBC.getBuffer();
+}
+
+for (int i = 0; i < numberQuadruple; i++)
+{
+dataIndex = start + i * 4;
+marker0   = base64Data[dataIndex + 2];
+marker1   = base64Data[dataIndex + 3];
+
+b1 = base64Alphabet[base64Data[dataIndex]];
+b2 = base64Alphabet[base64Data[dataIndex +1]];
+
+if (marker0 != PAD && marker1 != PAD)
+{
+//No PAD e.g 3cQl
+b3 = base64Alphabet[ marker0 ];
+b4 = base64Alphabet[ marker1 ];
+
+decodedData[encodedIndex]   = (byte) ((  b1 <<2 | b2>>4 ) & 
0xff);
+decodedData[encodedIndex + 1] =
+(byte) b2 & 0xf)<<4 ) |( (b3>>2) & 0xf) ) & 0xff);
+decodedData[encodedIndex + 2] = (byte) (( b3<<6 | b4 ) & 
0xff);
+}
+else if (marker0 == PAD)
+{
+//Two PAD e.g. 3c[Pad][Pad]
+decodedData[encodedIndex]   = (byte) ((  b1 <<2 | b2>>4 ) & 
0xff);
+}
+else if (marker1 == PAD)
+{
+//One PAD e.g. 3cQ[Pad]
+b3 = base64Alphabet[ marker0 ];
+
+decodedData[encodedIndex]   = (byte) ((  b1 <<2 | b2>>4 ) & 
0xff);
+decodedData[encodedIndex + 1] =
+(byte) b2 & 0xf)<<4 ) |( (b3>>2) & 0xf) ) & 0xff);
+}
+encodedIndex += 3;
+}
+}
+
 }
Index: java/org/apache/catalina/authenticator/BasicAuthenticator.java
===
--- java/org/apache/catalina/authenticator/BasicAuthenticator.java  
(revision 901368)
+++ java/org/apache/catalina/authenticator/BasicAuthenticator.java  
(working copy)
@@ -161,18 +161,18 @@
 // FIXME: Add trimming
 // authorizationBC.trim();

-CharChunk authorizationCC = authorization.getCharChunk();
-Base64.decode(authorizationBC, authorizationCC);
+ByteChunk authorizationBCC = authorization.getByteChunk();
+Base64.decode(authorizationBC, authorizationBCC);

 // Get username and password
-int colon = authorizationCC.indexOf(':');
+int colon = authorizationBCC.indexOf(':',0);
 if (colon < 0) {
-username = authorizationCC.toString();
+username = authorizationBCC.toString();
 } else {
-char[] buf = authorizationCC.getBuffer();
+byte[] buf = authorizationBCC.getBuffer();
 username = new String(buf, 0, colon);
 password = new String(buf, colon + 1,
-authorizationCC.getEnd() - colon - 1);
+

Re: [OT] Basic Authentication Failed with multibyte username

Re: [OT] Basic Authentication Failed with multibyte username

Re: [OT] Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Re: Basic Authentication Failed with multibyte username

Basic Authentication Failed with multibyte username

17 matches

Site Navigation

Mail list logo

Footer information