On Wed, Jun 27, 2001 at 05:30:11PM -0400, Craig S Monroe wrote:
> open (SOURCE, "< $filename");
> 
> while (<SOURCE>){
>  if (m/\/nic\/login/){
>  substr ($_,28,4);
>  print;
>  }
> }

That substr is a no-op, meaning it does nothing.  If warnings had been
turned on, you would have seen something along the lines of:

    Useless use of substr in void context at test.pl line 5.

Always debug code with warnings and use strict on.

By the way, you also neglected to check your open call.  Always check your
open calls.

 
> The result is :
> 
>                                 <small>(<a
> href="/nic/login">Login</a>)</small></p>

Exactly, because the contents of $_ is that value.


 
> I would like to just get the word "Login" portion itself

Then you really don't want substr unless you are guaranteed the text will
always be in the exact same position in the line.  You may be able to get
away with a regex if the form of the HTML is guaranteed not to change, but
the real solution is to use an actual HTML parser, such as HTML::Parser.

    #!/usr/bin/perl -w

    use HTML::Parser;
    use strict;      
    use vars qw($in_nic_link);


    my $filename = "...";

    my $parser = HTML::Parser->new(
        api_version     =>  3,
        start_h         =>  [\&start_h, 'tagname, attr'],
        end_h           =>  [\&end_h,   'tagname'      ],
        text_h          =>  [\&text_h,  'text'         ],
    );                                                   
      
    $parser->parse_file($filename);


    sub start_h {
        my($tag, $attr) = (shift, shift);
        $in_nic_link++ if $tag eq 'a' && $$attr{'href'} =~ m!/nic/login!;
    }                                                                    

    sub end_h  { $in_nic_link-- if $_[0] eq 'a' && $in_nic_link }
    sub text_h { print "$_[0]\n" if $in_nic_link                }


Michael
--
Administrator                      www.shoebox.net
Programmer, System Administrator   www.gallanttech.com
--

Reply via email to