From:                   "Jon Shoberg" <[EMAIL PROTECTED]>

>  I need to remove HTML scripts from some pages.
> 
>  I have to replace
> 
>  <*script*>*</*script*> with blanks. This includes all
> javascript/vbscript
>  in between the tags
> 
>  I'm using the * as guidelines to show it must match several
>  variations.
> 
>  Thoughts ? Ideas? Suggestions?

use HTML::JFilter; # http://Jenda.Krynicky.cz/#HTML::JFilter

my $filter_tags;
open FILTER, '< DefaultAllowedHTML.txt';
        # http://Jenda.Krynicky.cz/perl/DefaultAllowedHTML.txt
{local $/;$filter_tags = <FILTER>;}
close FILTER;

my $filter = new HTML::JFilter $filter_tags, 'ssi';

$filter->doFILE($path_to_the_file, $path_to_output_file);
# od
$result = $filter->doSTRING( $source );


The module uses HTML::Parser to parse the HTML (therefore you should 
be fairly safe with it) and filters all tags and attributes NOT 
specified in the parameter to "new HTML::JFilter".

For tags like <script> it removes the body with the tag.

Jenda
===== [EMAIL PROTECTED] === http://Jenda.Krynicky.cz =====
When it comes to wine, women and song, wizards are allowed 
to get drunk and croon as much as they like.
        -- Terry Pratchett in Sourcery


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to