The safest way of doing this is not with a regex
(as the perlre doc agress), but using HTML::TokeParser.
my $quote = "'"; # or whatever you wish to quote attributes with
my $p = new HTML::TokeParser(\$myhtmlvar);
# will gen an new parser over your var: ref to the var
# as a straight scalar is taken as a file name
while (my $token = $p->get_token){
# Loop while we're getting something to parse
if (defined @$token[2]){ # Has arguments?
# above is a hash of your html element's attribute's name/value paris
$out .= '<'.$token[1]; # open element and add it's name
foreach (keys %{@$token[2]}){
$out = ' '. $_ .'='. $quote . %{@$token[2]}->{$_} . $quote;
} # next element attribute
$out .= ">\n"; # Close element
} else {
# we've no attributes, add it straight
ha-ha!
}
} # Next parse
That's not tested, and I've got to press on, so
where the code says 'ha-ha!', you're going to have
to read the perldoc HTML::TokeParser docs and
add a bit that says: if we're a start tag
(@$token[0] eq 'S') or whatever, add the relevant
literal - which is listed in the docs, and varies
for each element. Easy to do for takes longer
than I've got, sorry.
hth
Lee
---
Obligatory perl schmutter .sig:
perl -e "print chr(rand>.5?92:47) while 1"
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED]]On Behalf Of
> Roland Corbet
> Sent: 28 June 2001 15:30
> To: [EMAIL PROTECTED]
> Cc: [EMAIL PROTECTED]
> Subject: Subbing " and ' in variables containing HTML
>
>
> I have a variable, $ARTICLE, which contains HTML formatted text.
>
> I currently substitute all of the " and ' in the variable, using the following regex:
>
> $ARTICLE =~ s/\"/\"\;/g;
> $ARTICLE =~ s/\'/\'\;/g;
>
> for their HTML safe equivalents.
>
> The problem I have is that this automatically modifies the " inside html tags, which
>I don't want to happen (e.g. <a
> href="). Can anyone help me out with a regex to substitute the above characters,
>but not inside < > marks, inside the
> variable $ARTICLE.
>
> Many thanks in anticipation of your help.
>
> Regards,
>
> Roland
>
> --
> Roland Corbet
> Systems Administrator & Developer
> Psyche Solutions Limited
> Chester Road
> Cradley Heath
> West Midlands
> B64 6AB
>
> Tel: + 44 (0)1384 414183 Ext. 4412
> Fax: + 44 (0)1384 414141
> Email: [EMAIL PROTECTED]
> WWW: http://www.psyche.net.uk
>
> _______________________________________________
> Perl-Win32-Web mailing list
> [EMAIL PROTECTED]
> http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web
_______________________________________________
Perl-Win32-Web mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/perl-win32-web