Bill Platt wrote:

> Hello,
> 
> I have included a section of code below
> that is driving me nuts.
> 
> If I don't run the Substitution operations,
> then I can successfully extract the URL
> and the imbedded anchor text from
> $parsed_html.
> 
> Once I include the Substitution operations,
> then I cannot extract the same results.
> 
> Even though the output text looks theoretically
> correct, I cannot see why any combination of the
> Substitution operation breaks my code.
> 
> Can you offer any suggestions to me?
> 
> 
> 
> if($parsed_html =~ m/href/)
> {
> 
> $parsed_html =~ s/\s+/ /gs;
> $parsed_html =~ s/>/">/gs;

The above could cause problems later.

> $parsed_html =~ s/=http/="http/gis;
> $parsed_html =~ s/"+/"/gs;
> $parsed_html =~ s/'"/'/gs;
> $_ = "$parsed_html";
> 
> @urlmatch = (@urlmatch,$2,$4) while m{
>      < \s*
>      A \s+ HREF \s* = \s* (["'])  (.*?)  (["'])
>      \s* > \s* (.*?) \s* <\/a \s* >

There is a " before the last > that you will need to account for.

> }gsix;
> 
> print "0=$urlmatch[0]<BR>1=$urlmatch[1]<BR>2=$urlmatch[2]<BR>";
> print "3=$urlmatch[3]<BR>4=$urlmatch[4]<BR>5=$urlmatch[5]<BR>";
> 
> print "s0=$0<BR>s1=$1<BR>s2=$2<BR>s3=$3<BR>s4=$4<BR>s5=$5<BR>";
> print "$_<BR><HR>$parsed_html<BR><HR>";
> 
> }

my @urlmatch;
my $parsed_html =
  "<A HREF=http://www.fubar.com/>URL</A>\n<A 
HREF=http://www.fubar2.com/>URL2</A>\n";

if ($parsed_html =~ m/href/i) {

        $parsed_html =~ s/\s+/ /gs;
        $parsed_html =~ s/>/">/gs;
        $parsed_html =~ s/=http/="http/gis;
        $parsed_html =~ s/"+/"/gs;
        $parsed_html =~ s/'"/'/gs;
        $_ = $parsed_html;

        print "\$_=$_\n";
        while (   # note I added "? to the last part of the RE ------v (or just 
drop the \s*> part)
          /<\s*A\s+HREF\s*=\s*(["'])(.*?)(["'])\s*>\s*([^<]*)\s*<\/a"*\s*>/gis) 
{

                # print $n variables out:

                for (1..9) {
                        eval "print \"<BR>$_=', \$$_, '\n\" if defined \$$_";
                }

        }
}
_______________________________________________
Perl-Unix-Users mailing list
Perl-Unix-Users@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to