Excellent thanks! I'll look this over and give her a go! I appreciate your time and energy
Dan > -----Original Message----- > From: david [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, February 12, 2003 3:36 PM > To: [EMAIL PROTECTED] > Subject: RE: HTML::TokeParser > > > Dan Muey wrote: > > > whatever is inbetween the <a tags. > > > > I winder if it's possible to do some thing like this : > > > > if($token->[0] eq 'a'){ > > print $token->[1]{href} || "what?","\n"; > > my $link_guts = $tok->get_trimmed_text("/a"); > > > > and then some how grab the 'src' and 'alt' attributes from each img > > tag in $link_guts if it's an image and the regular text if it's not > > and probably all three if it has an img's and text > > > > that's why parsing HTML is tricky and XML is on the way to > rescue. is you > use get_token() instead of get_tag(), it might be easier. get_token() > return for all token and it will be the programmer's > responsibility to use > the token. get_tag() eats up the tokens you don't want so it's tricky: > > #!/usr/bin/perl -w > use strict; > > use HTML::TokeParser; > > my $tok = new HTML::TokeParser(*DATA) || die $!; > while(1){ > > my $token = $tok->get_token(); > last unless($token); > > if($token->[0] eq 'T'){ > print "Text: $token->[1]\n" if($token->[1] =~ /\S/); > }elsif($token->[0] eq 'S' && $token->[1] eq 'img'){ > print "IMG $token->[2]{src}\n"; > }elsif($token->[0] eq 'S' && $token->[1] eq 'a'){ > print "LINK $token->[2]{href}\n"; > } > } > > __END__ > > all tokens are returned to you no matter where they are so > <img> within <a>, <a> within <img>, <a> within <a>, etc will > all be returned to you. if you > add a little bit more logic, it's easy to find all nesting tags... > > david > > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]