Re: [Mimedefang] scanning message body

Joseph Brennan Fri, 19 Mar 2004 09:41:55 -0800

--On Friday, March 19, 2004 10:24 AM -0600 Matthew Simpson <[EMAIL PROTECTED]> wrote:

I need some quick help scanning the message body for URLs and certain HTML
tags.

We do this, in filter(), to catch tags.  It changes <object to
<no-object, etc, disabling it.


   # Check for bad code in HTML parts
   if ($type eq "text/html") {
       my($bla,$badtag);
       if ($io = $entity->open("r")) {
           while (defined($_ = $io->getline)) {
               # note iframe, script, object
               if (/<(iframe|script|object) /i) {
                   $badtag = $1;
                   $_ =~ s/<(iframe|script|object)\b/<no-$1 /ig;
               }
               $bla .= $_;
           }
           $io->close;
       }
       if ($badtag) {
           if ($io = $entity->open("w")) {
               $io->print($bla);
               $io->close;
           }
           md_graphdefang_log('modify',"$badtag tag deactivated");
           action_change_header("X-Warning",
                                "$badtag tag modified by Columbia filter");
           action_rebuild();
       }
   }

Bugged IMG tags are probably next thing to go into this section.
Personally I use a MUA that does not show images.

Scanning for URLs is much harder.  The above does not catch things
broken over more than one line.  You can set $/="\n\n" to work by
paragraphs but I think some of the more obfuscated garbage even
spans paragraphs.  I just started looking at this.  Basically you
have to catch <a.href and then buffer all till the next </a>, with
some kind of stream input.  I didn't peak at Anomy HTML Cleaner yet
to see how they do it :-)   And if you really want to do a lot of
HTML cleaning, well, they do it all-- more than we want to do.

Joseph Brennan
Academic Technologies Group, Academic Information Systems (AcIS)
Columbia University in the City of New York

_______________________________________________
Visit http://www.mimedefang.org and http://www.canit.ca
MIMEDefang mailing list
[EMAIL PROTECTED]
http://lists.roaringpenguin.com/mailman/listinfo/mimedefang

Re: [Mimedefang] scanning message body

Reply via email to