Dave G wrote:

If that text is not properly validated and escaped, you could be open to SQL Injection attacks
>
        I'm less clear on what "properly escaped" means. I thought
escaping was a matter of putting slashes before special characters, so
that their presence doesn't confuse the SQL queries one might run. Is it
possible that if one has taken at least that much precaution that a user
could still enter malicious script held in a TEXT column?

Escaping the data so it's safe to put into a database query is only part of the solution. It really depends on how the data goes into the query how it should be escaped/validated, too.


If you have

WHERE id = $id

then you need to ensure $id is a number and only a number. 1, 100, 10.5, -14.56 and 5.54E06 are all valid values for $id in this case. is_integer(), is_numeric() and using (int), (float) to case values ($id = (int)$id) help here.

If you have

WHERE name = '$name'

in the query, then you need to ensure any single quotes within $name are escaped according to your database. MySQL uses backslashes, so you can use addslashes() to escape the value of $name. Other database use another single quote, so you need O''Kelly instead of O\'Kelly. To further complicate things, you have to take into account the magic_quotes_gpc setting. If that's enabled, PHP would have already escaped any incoming GET/COOKIE/POST/REQUEST data using addslashes(). So if you run addslashes() again, you're data will be escaped twice.

The thing to remember is that if you put O\'Kelly into the database, you should be seeing O'Kelly inside the database when doing a SELECT. The \ is simply there to escape the quote upon executing the query. If you see O\'Kelly actually in your database, then you're escaping your data twice. If you find you have to use stripslashes() when you pull data from your database (you shouldn't have to use it), then you're escaping data twice OR you may have magic_quotes_runtime enabled (which will escape data coming back out of databases and files, although this is off by default).

If you have

WHERE "$name"

in your query, then you need to ensure double quotes are escaped within $name. addslashes() and magic_quotes_gpc will take care of single and double quotes, though, so you're covered there. A lot of people thing that you only need to escape single quotes, but it really depends on how you write your queries.

Now that the data is safely in the database, you'll eventually want to display it back to the user, right? Again, you need to ensure the data is escaped (or more properly - encoded) so that any HTML/JavaScript/etc within the data is not rendered on your page (unless you really want it to). If the data came from the user, then you DO NOT want it to render, trust me.

Now, if you're validating everything to be a number or say 5 characters, then there's no real malicious code that could be inserted to be rendered on your page. However, the thing to realize is that, sure, you're only allowing 5 character now. Tomorrow your partner comes along and decides to allow 50 characters. He changes your substr() call to chop it to 50 characters and changes the database column. Now, since you weren't encoding the data before you displayed it back to the user, you could be in trouble. The moral is that it really wouldn't hurt to encode a string that you know will only be 5 characters just to cover things if they ever change.

So how is this encoding done? htmlentities() is your best friend. When you retrieve data from the database/file, you run it through htmlentities() before putting it on your web page. So something like <img> supplied by the user will be sent as &lt;img&gt; in the HTML source. The user will actually see "<img>" instead of an image box and a possibly distasteful image.

Another use for htmlentities() is for when you display data back to the user in a form <input> element. This is pretty common for when you want to redisplay a form with the data the user gave so they can edit it, correct it, whatever. Normally, you'll see someone do this:

<input type="text" name="name" value="<?=$name?>">

Well, what if the value of $name contains a double quote?

<input type="text" name="name" value="a double " quote">

That "HTML" will confuse the browser. It'll see "a double" as the value of the <input> element and quote" as an unrecognized attribute. Now, that doesn't really cause any harm, you just lose some text. But if the user can supply a value beginning with "> (such as ">My HTML<img>), then just ended your <input> element and anything after it will be rendered as HTML.

<input type="text" name="name" value="">My HTML<img>">

Now you're letting them write any HTML/JavaScript/etc they want into your page. This would allow them to inject JavaScript from a remote site, redirect users, and steal cookie values. The PHP session id is saved in a cookie. Once I have that session id, I can hijack your session by providing the same session id when requesting a page on your site.

Again, htmlentities() is your friend here. Using it on the value above would give you this.

<input type="text" name="name" value="&quot;>My HTML<img>">

The "&quot;" will be in the HTML source, but it'll actually show up as a double quote " in the text box. Try it. :)

Now you're safe, right? You forgot about your partner again. Now he comes along and decides to change the <input> element. He just happens to like using single quotes, btw.

<input type='text' name='name' value='<?=$name?>'>

htmlentities() will only escape double quotes by default. So now malicious users can use a single quote to end your <input> element and inject HTML/JavaScript/whatever. Not to be outdone, yet, there are some extra flags you can pass to htmlentities() to tell it how to handle quotes. Using

$name = htmlentities($name,ENT_QUOTES);

will take care of encoding single and double quotes. You saved yourself from your partner again. :)

All safe, right? I don't think so. It's a never ending battle. We haven't even gotten into mail header injection or system calls yet, among other things. Maybe I'll let Chris explain those. ;)

General rules to follow:

1. Never trust user data. That means don't use it in a query or display it to the user without validating/escaping/encoding it first.

2. Validate for "known good" characters. Don't strip out < and > characters because they're "bad" because you could be missing other bad characters that'll allow malicious actions to take place. Even if you think you've got them all, something may change. Decide what is "good", i.e. alphanumeric, numbers only, [A-Z!-], etc...

I'm sure there are more, but that's enough typing for now. :)

--
---John Holmes...

Amazon Wishlist: www.amazon.com/o/registry/3BEXC84AB3A5E/

php|architect: The Magazine for PHP Professionals – www.phparch.com

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to