Curtis, Thank you for your observations and compliments (first paragraph). No your right, I haven't been using taint (*I hang my head in shame*).
I probably would have done a few things slightly differently if I had a perfect vision I was I was creating when I began it. You see on the parse_mform_input vs parse_form_input, I did up until very recently have two functions, Form() and MimeForm(). Form() called parse_form_input and MimeForm() called parse_mform_input, but I did take into account at the time of writing it that somebody may use the MimeForm() function when it was not truly mime. So I made it just work in this instance. Within the last month is when I realized that I should just make one function that deals with both. So I trimmed it to just Form(). I think you are right, I will re-arrange the subs a little, split a few more out, perform the mime check before choosing a sub specifying that its mime and so on. So if I understand correctly, you would recommend something more to the effect of ---------------------------------- if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/ ) { $boundary = $1; # Using MIME to split out the form elements. $boundary = '--'.$boundary if ($input =~ /--$boundary/); ---------------------------------- Hmm, "Death to Dot Star!" a good read, but the reason I did this: ---------------------------------- if ($listitem =~ / name=\"(.*)\"; filename=\"(.*?)\"[\r\n]{2}/){ ---------------------------------- the way I did was so that if name or filename contained a quote (though I'm not sure its possible they can, but if what you say is true, netscape may), it would get matched. You see the ' name=\"' verifies I'm starting where I think I am, the '\"; filename=\"' verifies I'm ending and beginning the next one where I intend to and the '\"[\r\n]{2}' confirms I will finish my matching where I've expected. The question mark in the filename is probably not necessary, and If I was to stick with the syntax above, I should probably add a ? to the name match as well. Now having had that said, What do you think about my reasoning on this issue? But yes, I had thought this regex through many times to come up with a solution that would catch anything I may encounter (especially quotes). > I understand that some user agents do not wrap the values in quote marks So it should be more like ---------------------------------- if ($listitem =~ / name=\"{0,1}(.*?)\"{0,1}; filename=\"{0,1}(.*?)\"{0,1}[\r\n]{2}/){ ---------------------------------- in order to catch the correct data where it may or may not be wrapped in quotes. >What is the CRLF sequence that the system that *this* version of your program is >using? Well the reason I did '[\r\n]{2}' was to catch any combination of these that the system may be using because on a windows server this is different than on a Linux box. I will check how CGI.pm does it to get some other ideas. Thanks, David ----- Original Message ----- From: "Ovid" <[EMAIL PROTECTED]> To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Thursday, July 04, 2002 2:27 PM Subject: Re: Form.pm (next installment of code review) Hi David, While I raise quite a few issues here, I have to say that I've rarely seen anyone present a CGI.pm alternative that supports uploads. I'm quite impressed, so keep that in mind while I pick it apart :) I'll ignore anything I may have covered in my discussion about the previous subroutine. Further, there are quite a few thing that I would have done differently, but I'll try to focus specifically on those issues that I consider bugs (and I'll skip some issues that I *think* are problems but I'd prefer to research further before shooting my mouth off :). I also won't bother on commenting on issues that I've seen you mention (DOS attacks, for instance). sub parse_mform_input($) { my $input = shift; my (%form_vars, @name_file, @list, %var_count); if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/ ) { # snip } else { # if it is not mime return(parse_form_input($input)); # call the non-mime parser } } Either you are parsing multipart/formdata (MFD) or you are not. Your sub name implies that this is for MFD, but then goes on to handle non-MFD. Don't throw curve balls at programmers :) if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/ ) { $boundary = '--'.$1; # Using MIME to split out the form elements. It turns out that IE 3.01 on Macintosh leaves off the initial '--' on the boundary. Mac users with this version of IE won't be able to upload files. This may not seem like much of an issue, but it's important to realize that there can be numerous OS/server/browser issues and for a CGI form handling routine to be truly robust, it should attempt to address as many of these issues as it can. If you want your code to be accepted for this, it has to be robust. You'll get mysterious sporadic failures that can be very hard to debug if you don't make sure to handle all of the special cases. Further, if someone using your code notices odd, intermittant failures that they don't get with CGI.pm, they'll probably *have* to go with CGI.pm. Fast code that doesn't work almost always loses to slow code that does work. # split out each form variable into @list @list = split(/$boundary/, $input); I see you're just going ahead and processing the data without any sanity checks. What happens if someone uploads a large file and hits "Esc" about halfway through? Does every Web server check this? If your's doesn't, $ENV{CONTENT_LENGTH} is not going to match the length of the data. You need to check this to ensure that you're not processing "partial" data and thereby risking some data corruption. foreach $listitem (@list){ Again, you appear to not be using strict. # if the data has a filename in it then if ($listitem =~ / name=\"(.*)\"; filename=\"(.*?)\"[\r\n]{2}/){ I see some a bad regex here. Items: 1. I understand that some user agents do not wrap the values in quote marks. Also, some versions of Netscape don't escape quotes in filenames. 2. The dot star in your regex is bad. Read "Death to Dot Star!" for more information. http://www.perlmonks.org/index.pl?node_id=24640. This can cause slow regexes, or worse, buggy ones. 3. You need to define the CRLF sequence. What is the CRLF sequence that the system that *this* version of your program is using? Not every system is going to use the same one. Read through CGI.pm and see how it's handled there. You can search on $CRLF. # if the file name is not empty now if ($form_file{'file_name'} ne "") { # if the file is larger than 0 bytes if (length($form_file{'file_content'}) > 0){ # creating a garanteed unique file name Rather than going through all of you code to guarantee a unique name (which will be thrown away anyway), why not just do some variation of this: use Digest::MD5; my $temp_file; do { $temp_file = Digest::MD5::md5_hex( $ENV{REMOTE_ADDR}, rand() ); } until ( ! -e $upload_dir.$temp_file ); Much cleaner and easier to understand, though I confess that I was commenting on style and not a bug, per se. Also, I think you are overcommenting. Your last snippet testing that the filename isn't empty and the content length is greater than zero are good examples. Much of the rest of the code has issues similar to what I've outlined above. I would also strongly recommend breaking your subroutines up into much smaller chunks. They're easier to understand that way. Cheers, Curtis "Ovid" Poe ===== "Ovid" on http://www.perlmonks.org/ Someone asked me how to count to 10 in Perl: push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//; shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A __________________________________________________ Do You Yahoo!? Sign up for SBC Yahoo! Dial - First Month Free http://sbc.yahoo.com -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]