The safest way of doing this is not with a regex
(as the perlre doc agress), but using HTML::TokeParser.

my $quote = "'"; # or whatever you wish to quote attributes with
my $p = new HTML::TokeParser(\$myhtmlvar);
# will gen an new parser over your var: ref to the var
# as a straight scalar is taken as a file name
while (my $token = $p->get_token){
# Loop while we're getting something to parse
  if (defined @$token[2]){  # Has arguments?
    # above is a hash of your html element's attribute's name/value paris
    $out .= '<'.$token[1];     # open element and add it's name
    foreach (keys %{@$token[2]}){
       $out = ' '. $_ .'='. $quote . %{@$token[2]}->{$_} . $quote;
    } # next element attribute
    $out .= ">\n"; # Close element
  } else {
    # we've no attributes, add it straight
    ha-ha!
  }
} # Next parse

That's not tested, and I've got to press on, so
where the code says 'ha-ha!', you're going to have
to read the perldoc HTML::TokeParser docs and 
add a bit that says: if we're a start tag
(@$token[0] eq 'S') or whatever, add the relevant
literal - which is listed in the docs, and varies
for each element.  Easy to do for takes longer
than I've got, sorry.

hth

Lee
---
Obligatory perl schmutter .sig:
perl -e "print chr(rand>.5?92:47) while 1"

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of
> Roland Corbet
> Sent: 28 June 2001 15:30
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Subbing " and ' in variables containing HTML
> 
> 
> I have a variable, $ARTICLE, which contains HTML formatted text.
> 
> I currently substitute all of the " and ' in the variable, using the following regex:
> 
> $ARTICLE =~ s/\"/\&quot\;/g;
> $ARTICLE =~ s/\'/\&#39\;/g;
> 
> for their HTML safe equivalents.
> 
> The problem I have is that this automatically modifies the " inside html tags, which 
>I don't want to happen (e.g. <a 
> href=").  Can anyone help me out with a regex to substitute the above characters, 
>but not inside < > marks, inside the 
> variable $ARTICLE.
> 
> Many thanks in anticipation of your help.
> 
> Regards,
> 
> Roland
> 
> -- 
> Roland Corbet
> Systems Administrator & Developer
> Psyche Solutions Limited
> Chester Road
> Cradley Heath
> West Midlands
> B64 6AB
> 
> Tel:    + 44 (0)1384 414183 Ext. 4412
> Fax:    + 44 (0)1384 414141
> Email:  [EMAIL PROTECTED]
> WWW:   http://www.psyche.net.uk
> 
> _______________________________________________
> Perl-Win32-Web mailing list
> [EMAIL PROTECTED]
> http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web
_______________________________________________
Perl-Win32-Web mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web

Reply via email to