Curtis,
    Thank you for your observations and compliments (first paragraph).  No your right, 
I haven't
been using taint (*I hang my head in shame*).

    I probably would have done a few things slightly differently if I had a perfect 
vision I was I
was creating when I began it.  You see on the parse_mform_input vs parse_form_input, I 
did up until
very recently have two functions, Form() and MimeForm().  Form() called 
parse_form_input and
MimeForm() called parse_mform_input, but I did take into account at the time of 
writing it that
somebody may use the MimeForm() function when it was not truly mime.  So I made it 
just work in this
instance.  Within the last month is when I realized that I should just make one 
function that deals
with both.  So I trimmed it to just Form().  I think you are right, I will re-arrange 
the subs a
little, split a few more out, perform the mime check before choosing a sub specifying 
that its mime
and so on.

    So if I understand correctly, you would recommend something more to the effect of
----------------------------------
if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/ ) {
    $boundary = $1;  # Using MIME to split out the form elements.
    $boundary = '--'.$boundary if ($input =~ /--$boundary/);
----------------------------------

    Hmm, "Death to Dot Star!" a good read, but the reason I did this:
----------------------------------
if ($listitem =~ / name=\"(.*)\"; filename=\"(.*?)\"[\r\n]{2}/){
----------------------------------
the way I did was so that if name or filename contained a quote (though I'm not sure 
its possible
they can, but if what you say is true, netscape may), it would get matched.  You see 
the ' name=\"'
verifies I'm starting where I think I am, the '\"; filename=\"' verifies I'm ending 
and beginning
the next one where I intend to and the '\"[\r\n]{2}' confirms I will finish my 
matching where I've
expected.  The question mark in the filename is probably not necessary, and If I was 
to stick with
the syntax above, I should probably add a ? to the name match as well.
    Now having had that said, What do you think about my reasoning on this issue?  But 
yes, I had
thought this regex through many times to come up with a solution that would catch 
anything I may
encounter (especially quotes).

> I understand that some user agents do not wrap the values in quote marks
    So it should be more like
----------------------------------
if ($listitem =~ / name=\"{0,1}(.*?)\"{0,1}; filename=\"{0,1}(.*?)\"{0,1}[\r\n]{2}/){
----------------------------------
in order to catch the correct data where it may or may not be wrapped in quotes.

>What is the CRLF sequence that the system that *this* version of your program is 
>using?
    Well the reason I did '[\r\n]{2}' was to catch any combination of these that the 
system may be
using because on a windows server this is different than on a Linux box.  I will check 
how CGI.pm
does it to get some other ideas.

Thanks,
David



----- Original Message -----
From: "Ovid" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, July 04, 2002 2:27 PM
Subject: Re: Form.pm (next installment of code review)


Hi David,

While I raise quite a few issues here, I have to say that I've rarely seen
anyone present a CGI.pm alternative that supports uploads.  I'm quite
impressed, so keep that in mind while I pick it apart :)

I'll ignore anything I may have covered in my discussion about the previous
subroutine.  Further, there are quite a few thing that I would have done
differently, but I'll try to focus specifically on those issues that I
consider bugs (and I'll skip some issues that I *think* are problems but
I'd prefer to research further before shooting my mouth off :).  I also won't
bother on commenting on issues that I've seen you mention (DOS attacks,
for instance).

    sub parse_mform_input($) {
        my $input = shift;
        my (%form_vars, @name_file, @list, %var_count);
        if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/ ) {

    # snip

        } else { # if it is not mime
            return(parse_form_input($input)); # call the non-mime parser
        }
    }

Either you are parsing multipart/formdata (MFD) or you are not.  Your sub name
implies that this is for MFD, but then goes on to handle non-MFD.  Don't throw
curve balls at programmers :)

    if( $ENV{'CONTENT_TYPE'} =~ /multipart\/form-data; boundary=(.+)$/ ) {
        $boundary = '--'.$1;  # Using MIME to split out the form elements.

It turns out that IE 3.01 on Macintosh leaves off the initial '--' on the
boundary.  Mac users with this version of IE won't be able to upload files.
This may not seem like much of an issue, but it's important to realize that
there can be numerous OS/server/browser issues and for a CGI form handling
routine to be truly robust, it should attempt to address as many of these
issues as it can.  If you want your code to be accepted for this, it has to
be robust.  You'll get mysterious sporadic failures that can be very hard to
debug if you don't make sure to handle all of the special cases.

Further, if someone using your code notices odd, intermittant failures that
they don't get with CGI.pm, they'll probably *have* to go with CGI.pm.  Fast
code that doesn't work almost always loses to slow code that does work.

    # split out each form variable into @list
    @list = split(/$boundary/, $input);

I see you're just going ahead and processing the data without any sanity
checks.  What happens if someone uploads a large file and hits "Esc" about
halfway through?  Does every Web server check this?  If your's doesn't,
$ENV{CONTENT_LENGTH} is not going to match the length of the data.  You need to
check this to ensure that you're not processing "partial" data and thereby
risking some data corruption.

    foreach $listitem (@list){

Again, you appear to not be using strict.

    # if the data has a filename in it then
    if ($listitem =~ / name=\"(.*)\"; filename=\"(.*?)\"[\r\n]{2}/){

I see some a bad regex here.  Items:

1.  I understand that some user agents do not wrap the values in quote marks.
Also, some versions of Netscape don't escape quotes in filenames.

2.  The dot star in your regex is bad.  Read "Death to Dot Star!" for more
information.  http://www.perlmonks.org/index.pl?node_id=24640.  This can
cause slow regexes, or worse, buggy ones.

3.  You need to define the CRLF sequence.   What is the CRLF sequence that the
system that *this* version of your program is using?  Not every system is going
to use the same one.  Read through CGI.pm and see how it's handled there.  You
can search on $CRLF.

    # if the file name is not empty now
    if ($form_file{'file_name'} ne "") {
        # if the file is larger than 0 bytes
        if (length($form_file{'file_content'}) > 0){
            # creating a garanteed unique file name

Rather than going through all of you code to guarantee a unique name (which
will be thrown away anyway), why not just do some variation of this:

  use Digest::MD5;
  my $temp_file;
  do {
      $temp_file = Digest::MD5::md5_hex( $ENV{REMOTE_ADDR}, rand() );
  } until ( ! -e $upload_dir.$temp_file );

Much cleaner and easier to understand, though I confess that I was commenting
on style and not a bug, per se.

Also, I think you are overcommenting.  Your last snippet testing that the
filename isn't empty and the content length is greater than zero are good
examples.

Much of the rest of the code has issues similar to what I've outlined above.  I
would also strongly recommend breaking your subroutines up into much smaller
chunks.  They're easier to understand that way.

Cheers,
Curtis "Ovid" Poe

=====
"Ovid" on http://www.perlmonks.org/
Someone asked me how to count to 10 in Perl:
push@A,$_ for reverse q.e...q.n.;for(@A){$_=unpack(q|c|,$_);@a=split//;
shift@a;shift@a if $a[$[]eq$[;$_=join q||,@a};print $_,$/for reverse @A

__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to