SUMMARY By default, regexes shouldn't untaint. Also, provide a toolkit for Safer Untainting.
DETAILS We're all aware of how you go about untainting data: run it through a regex and grab the stuff that was in the parens: unless ($var =~ m/^(\w+)$/) or die 'unsafe data'; $var = $1; Untainting using the example above makes sense, but I've always been uncomfortable with the fact that $1 is always untainted, even if that wasn't my intent in using parens. Consider this example of parsing a string of fixed- length records: @fields = $raw =~ m/^(.{3})(.{25})(.{30})/; The data that comes back from that expression is always untainted. Frankly, few non-advanced programmers are going to notice that. Yes, there are techniques for retainting, but when you get to that point it starts feeling like you're working against the language, not with it. Ergo, I propose that regexes only untaint stuff in parens if you specifically tell them to do so. A capital-T switch would work nicely: unless ($var =~ m/^(\w+)$/T) or die 'unsafe data'; The T regex switch will help lazy programmers like me, but there's more that could be done to encourage responsible untainting. As has been pointed out ad nauseum, /(.*)/ unravels the whole taint sweater. Granted, anybody who uses tainting but also uses that regex is just plain goofy, but more subtle mistakes still happen, like filtering out the bad instead of filtering in the good. So I'd like to propose that a standard module be included with Perl that does a series of common, useful taint checks, and furthermore that use of that untaint module is encouraged as the beginner's tool for untainting. Here's some examples of how it would work: use Untaint ':all'; # Untaint $var only if $var =~ m/^(\w+)$/ # Why is that the default? See below. untaint($var); # Same as default untaint($var, -format=>Untaint::WordOnly); # Untaint $var only if $var is the path to a # file that already exists. untaint($var, -format => Untaint::FileExists ); # Untaint $var only if $var is the path to a file, and # that file is within /tmp/myfiles untaint($var, -format => Untaint::FileExists, -tree=>'/tmp/myfiles' ) # Untaint $var unconditionally, which is dangerous untaint($var, -format => Untaint::Dangerous ) # I'm sure we could think of many others Why do I propose that the default for untaint is to only untaint vars that are only word characters? I do so because it puts into beginner's heads the idea that untainting is something you only do so carefully screened data, and furthermore because word-char-only strings are one of the most common and safest forms of untainting (IMHO). untaint would include the ability to untaint everything, but you have to acknowledge that it's Untaint::Dangerous to do so. -Miko ------------------------------------------------- This mail sent through IMP: http://horde.org/imp/