Bill Platt wrote: > Hello, > > I have included a section of code below > that is driving me nuts. > > If I don't run the Substitution operations, > then I can successfully extract the URL > and the imbedded anchor text from > $parsed_html. > > Once I include the Substitution operations, > then I cannot extract the same results. > > Even though the output text looks theoretically > correct, I cannot see why any combination of the > Substitution operation breaks my code. > > Can you offer any suggestions to me? > > > > if($parsed_html =~ m/href/) > { > > $parsed_html =~ s/\s+/ /gs; > $parsed_html =~ s/>/">/gs;
The above could cause problems later. > $parsed_html =~ s/=http/="http/gis; > $parsed_html =~ s/"+/"/gs; > $parsed_html =~ s/'"/'/gs; > $_ = "$parsed_html"; > > @urlmatch = (@urlmatch,$2,$4) while m{ > < \s* > A \s+ HREF \s* = \s* (["']) (.*?) (["']) > \s* > \s* (.*?) \s* <\/a \s* > There is a " before the last > that you will need to account for. > }gsix; > > print "0=$urlmatch[0]<BR>1=$urlmatch[1]<BR>2=$urlmatch[2]<BR>"; > print "3=$urlmatch[3]<BR>4=$urlmatch[4]<BR>5=$urlmatch[5]<BR>"; > > print "s0=$0<BR>s1=$1<BR>s2=$2<BR>s3=$3<BR>s4=$4<BR>s5=$5<BR>"; > print "$_<BR><HR>$parsed_html<BR><HR>"; > > } my @urlmatch; my $parsed_html = "<A HREF=http://www.fubar.com/>URL</A>\n<A HREF=http://www.fubar2.com/>URL2</A>\n"; if ($parsed_html =~ m/href/i) { $parsed_html =~ s/\s+/ /gs; $parsed_html =~ s/>/">/gs; $parsed_html =~ s/=http/="http/gis; $parsed_html =~ s/"+/"/gs; $parsed_html =~ s/'"/'/gs; $_ = $parsed_html; print "\$_=$_\n"; while ( # note I added "? to the last part of the RE ------v (or just drop the \s*> part) /<\s*A\s+HREF\s*=\s*(["'])(.*?)(["'])\s*>\s*([^<]*)\s*<\/a"*\s*>/gis) { # print $n variables out: for (1..9) { eval "print \"<BR>$_=', \$$_, '\n\" if defined \$$_"; } } } _______________________________________________ Perl-Unix-Users mailing list Perl-Unix-Users@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs