Character encoding for POST x-www-form-urlencoding (a success story)

2010-02-12 Thread Christopher Schultz
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

All,

My company recently decided to alter our password complexity
requirements for our webapp, and I got to implement the changes. What fun!

We use a regular expression to enforce our password complexity, and it
needed to be changed. Since we are starting to branch-out into
populations that aren't necessarily using written English everywhere, I
chose to change our naive [a-z]- and [A-Z]-type checking to a mroe
enlightened \p{Ll} and \p{Lu}, respectively. (Readers' note: jakarta-oro
does not support this notation, so you'll want to use Java's built-in
regular expression support to do this).

Anyhow, when making changes to things security-related, it pays to test
/everything/, so I grabbed 4 other people from my group and had them
each test 15 sample passwords against our 6 different forms that accept
password-change entry. Everything went fine.

Except when I then tried to login from our home page with the password
1πππ (that's a '1' digit followed by 7 Greek Pi characters, in
case your email reader can't render that), and I got a failure. I
figured I must have fat-fingered something, so I tried again and all was
well.

My spidey-sense tingling, I logged-out and repeated the process: again,
my first login attempt was unsuccessful, while the second was. Hmm. Upon
closer inspection, our opening page is a static HTML file served by
Apache httpd -- no Tomcat involvement. After a failed login, a page that
looks exactly like the home page is sent to the user, but it's
different: /and/ it's served by Tomcat.

The difference was that the original request's response (for
/index.html) had a Content-Type of text/html, while the failed login
had a response Content-Type of text/html; charset=UTF-8.

It's out old pal what's the default encoding, again? coming back to
haunt me, and here I am telling people on this list that they just don't
understand the history of the web and how to do things properly.
Evidently, I wasn't doing them properly, either.

All those complaints about the way that URL-encoded GET parameters can
get messed up based upon Content-Type and encoding guesses, etc. and the
solution is just to use POST is, well, only half the truth. Yes, POST
gets you away from the browser's preference for what encoding to use
before URL-encoding the bytes, but, with POST the Content-Type is
application/x-www-form-urlencoded, which means there's no charset
associated with it. :(

So, what's to be done?

Well, I immediately thought of two solutions:

meta http-equiv=Content-Type content=text/html; charset=UTF-8 /
and
form accept-charset=UTF-8

Knowing that web browsers are notoriously inconsistent with one another
regarding certain things, I was sure that I'd have a giant mess when it
came to testing, and that I'd have to figure out how to trick each
version of each browser into doing my bidding.

First, I had to make sure that they all /failed/ in the same way (that
is to say, that the login failed the way I expected it to fail), then I
had to see what magical incantations would be necessary to actually get
the login to succeed.

I'm happy to report that, for /all/ of the following browsers, */both/*
solutions worked!

Mozilla Firefox 2.0
Mozilla Firefox 3.0
Mozilla Firefox 3.5
Mozilla Firefox 3.6
Opera 9.6
Opera 10.10
Apple Safari 3.2
Apple Safari 4.0
Google Chrome 4.0
MSIE 6.0
MSIE 7.0
MSIE 8.0

I'm inclined to use the form accept-charset=UTF-8 solution, because
that does not involve lying to the browser about the encoding of the
actual HTML document. Instead, I'd rather advertise that I will only
accept UTF-8 encoding and leave it at that. Sadly, the client still
doesn't tell me that the underlying encoding being used to urlencode the
POST parameters is UTF-8, but at least they're doing what I want them to
do, and they all agree on behavior!

So, score 1 for standards, at least in this instance.

- -chris
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkt11PoACgkQ9CaO5/Lv0PC+OACgtobt70NWFxYJzcRt5r0zXlaN
tYEAn0ZYnB/oehIoZR0NUs7Q/4mOux7x
=U0Wt
-END PGP SIGNATURE-

-
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org



Re: Character encoding for POST x-www-form-urlencoding (a success story)

2010-02-12 Thread Xie Xiaodong
Very nice work, Thank you for the sharing.



On Fri, Feb 12, 2010 at 11:23 PM, Christopher Schultz 
ch...@christopherschultz.net wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 All,

 My company recently decided to alter our password complexity
 requirements for our webapp, and I got to implement the changes. What fun!

 We use a regular expression to enforce our password complexity, and it
 needed to be changed. Since we are starting to branch-out into
 populations that aren't necessarily using written English everywhere, I
 chose to change our naive [a-z]- and [A-Z]-type checking to a mroe
 enlightened \p{Ll} and \p{Lu}, respectively. (Readers' note: jakarta-oro
 does not support this notation, so you'll want to use Java's built-in
 regular expression support to do this).

 Anyhow, when making changes to things security-related, it pays to test
 /everything/, so I grabbed 4 other people from my group and had them
 each test 15 sample passwords against our 6 different forms that accept
 password-change entry. Everything went fine.

 Except when I then tried to login from our home page with the password
 1πππ (that's a '1' digit followed by 7 Greek Pi characters, in
 case your email reader can't render that), and I got a failure. I
 figured I must have fat-fingered something, so I tried again and all was
 well.

 My spidey-sense tingling, I logged-out and repeated the process: again,
 my first login attempt was unsuccessful, while the second was. Hmm. Upon
 closer inspection, our opening page is a static HTML file served by
 Apache httpd -- no Tomcat involvement. After a failed login, a page that
 looks exactly like the home page is sent to the user, but it's
 different: /and/ it's served by Tomcat.

 The difference was that the original request's response (for
 /index.html) had a Content-Type of text/html, while the failed login
 had a response Content-Type of text/html; charset=UTF-8.

 It's out old pal what's the default encoding, again? coming back to
 haunt me, and here I am telling people on this list that they just don't
 understand the history of the web and how to do things properly.
 Evidently, I wasn't doing them properly, either.

 All those complaints about the way that URL-encoded GET parameters can
 get messed up based upon Content-Type and encoding guesses, etc. and the
 solution is just to use POST is, well, only half the truth. Yes, POST
 gets you away from the browser's preference for what encoding to use
 before URL-encoding the bytes, but, with POST the Content-Type is
 application/x-www-form-urlencoded, which means there's no charset
 associated with it. :(

 So, what's to be done?

 Well, I immediately thought of two solutions:

 meta http-equiv=Content-Type content=text/html; charset=UTF-8 /
 and
 form accept-charset=UTF-8

 Knowing that web browsers are notoriously inconsistent with one another
 regarding certain things, I was sure that I'd have a giant mess when it
 came to testing, and that I'd have to figure out how to trick each
 version of each browser into doing my bidding.

 First, I had to make sure that they all /failed/ in the same way (that
 is to say, that the login failed the way I expected it to fail), then I
 had to see what magical incantations would be necessary to actually get
 the login to succeed.

 I'm happy to report that, for /all/ of the following browsers, */both/*
 solutions worked!

 Mozilla Firefox 2.0
 Mozilla Firefox 3.0
 Mozilla Firefox 3.5
 Mozilla Firefox 3.6
 Opera 9.6
 Opera 10.10
 Apple Safari 3.2
 Apple Safari 4.0
 Google Chrome 4.0
 MSIE 6.0
 MSIE 7.0
 MSIE 8.0

 I'm inclined to use the form accept-charset=UTF-8 solution, because
 that does not involve lying to the browser about the encoding of the
 actual HTML document. Instead, I'd rather advertise that I will only
 accept UTF-8 encoding and leave it at that. Sadly, the client still
 doesn't tell me that the underlying encoding being used to urlencode the
 POST parameters is UTF-8, but at least they're doing what I want them to
 do, and they all agree on behavior!

 So, score 1 for standards, at least in this instance.

 - -chris
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.10 (MingW32)
 Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

 iEYEARECAAYFAkt11PoACgkQ9CaO5/Lv0PC+OACgtobt70NWFxYJzcRt5r0zXlaN
 tYEAn0ZYnB/oehIoZR0NUs7Q/4mOux7x
 =U0Wt
 -END PGP SIGNATURE-

 -
 To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: users-h...@tomcat.apache.org




-- 
Sincerely yours and Best Regards,
Xie Xiaodong