RE: extracting links from HTML data (7my own problem usiing grep

2002-01-17 Thread sanilkumar


Friends,
I am Sanil.I am a computer Science student.I want to study expect
programming and python Qt. If anybody of you know this or sites for the
corresponding tutorial please please mail to me.
   from
 SANIL















On Tue, 15 Jan 2002, Gary Hawkins wrote:

> > However the script continues
> > print @list3;
> > my $var1='META';
> > @lista= grep{$var1} @list3;## not picked up at all
> > print @lista
> >
> > anyone any clues
>
> Suppose I'm a little confused but perhaps you meant:
>
> print @list3;
> @lista= grep(/META/, @list3);
> print @lista;
>
> /g
>
>
>

-- 

#  #
# SANILKUMAR.M.M   #
# S4 CSE   #
# REGIONAL ENGINEERING COLLEGE #
# CALICUT - KERALA #
#  #



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: extracting links from HTML data (7my own problem usiing grep

2002-01-16 Thread mike

On Wed, 2002-01-16 at 07:15, Gary Hawkins wrote:
> > However the script continues
> > print @list3;
> > my $var1='META';
> > @lista= grep{$var1} @list3;## not picked up at all
> > print @lista
> > 
> > anyone any clues
> 
> Suppose I'm a little confused but perhaps you meant:
> 
> print @list3;
> @lista= grep(/META/, @list3);
> print @lista;
> 
> /g

thanks a lot

couple of points

this is basically a learning/getting rid of rustiness exercise
why is this syntax not mentioned in programming perl - any ideas?

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: extracting links from HTML data (7my own problem usiing grep

2002-01-15 Thread Gary Hawkins

> However the script continues
> print @list3;
> my $var1='META';
> @lista= grep{$var1} @list3;## not picked up at all
> print @lista
> 
> anyone any clues

Suppose I'm a little confused but perhaps you meant:

print @list3;
@lista= grep(/META/, @list3);
print @lista;

/g


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: extracting links from HTML data (7my own problem usiing grep

2002-01-15 Thread mike

On Tue, 2002-01-15 at 01:44, Lorne Easton wrote:
> I need to write some code that extracts that extracts hyperlinks from a
> scalar ($data) and puts them into an array.
> 
> I imagine that grep can do this, but my mastery of it and
> reqular expressions are not brilliant.
> 
> Can you please provide some example code, or at least point me in the right
> direction?
> 
> Cheers,
> Lorne

This bit of my script does what you want

#!/usr/bin/perl -w
use locale;

my @list1=`grep 'HREF.*.html' $ARGV[0]`;

foreach $list1 (@list1){
$list1=~s/HREF="/\n/gi ;
$list1=~s/(\#.*\n)//gi ;# strip name refs
$list1=~s/"//gi ;

foreach $list1 (@list1){
%seen =();
@list3 = grep{ ! $seen{$_} ++ } @list1;# dedupe
}

However the script continues
print @list3;
my $var1='META';
@lista= grep{$var1} @list3;## not picked up at all
print @lista

anyone any clues


-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: extracting links from HTML data

2002-01-14 Thread Briac Pilpré

In article <[EMAIL PROTECTED]>, Lorne Easton wrote:
> I need to write some code that extracts that extracts hyperlinks from a
> scalar ($data) and puts them into an array.
> 
> I imagine that grep can do this, but my mastery of it and
> reqular expressions are not brilliant.
> 
> Can you please provide some example code, or at least point me in the right
> direction?

 If you only need the URLs of the hyperlinks, then HTML::LinkExtor is
 just what you need, and it is provided with HTML::Parser.
 HTML::SimpleLinkExtor might be worth a try too.

 http://search.cpan.org/search?dist=HTML-SimpleLinkExtor
 http://search.cpan.org/search?dist=HTML-Parser

 Otherwise, if you want the URLs and the text inside, something like the
 following might work:

#!/usr/bin/perl -w
use strict;
use HTML::Parser 3;

my $data = <<'_HTML_';
http://foo";>bar
foo text baz
http://baz";>quux
_HTML_

my @links = parse_links($data);

# We now print the links we found
my $count;
foreach (@links){
print ++$count . ". Description: $_->[1]\n   URL: $_->[0]\n\n"
}


sub parse_links {
my $data = shift;

my ( @links, $inside );
my $count = 0;

# Preparing the parser
my $linkparser = HTML::Parser->new(
report_tags   => ['a'], # Only dealing with  tags
unbroken_text => 1, # Avoid text split over several lines


# Called each time a  is found   
start_h   => [
sub {
# Storing the HREF attribute
$links[$count] = shift->{href};
# We should recall we're inside a  element
$inside = 1;
},
'attr'
],

# Called when  is found
end_h => [ sub { $count++; $inside = 0; }, '' ],

# Called when text is found
text_h => [ sub {
# We're only interested in text inside ...
return unless $inside;
# Store the text with the previous stored HREF 
# attribute
$links[$count] = [ $links[$count], shift ];
},
'dtext'
],
);
# Launch the parser
$linkparser->parse($data)->eof();
return wantarray ? @links : \@links;
}

__END__

-- 
briac
A flying swallow. 
A fox stalks under a she-oak. 
A nesting dove.

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: extracting links from HTML data

2002-01-14 Thread John W. Krahn

Lorne Easton wrote:
> 
> I need to write some code that extracts that extracts hyperlinks from a
> scalar ($data) and puts them into an array.
> 
> I imagine that grep can do this, but my mastery of it and
> reqular expressions are not brilliant.
> 
> Can you please provide some example code, or at least point me in the right
> direction?


http://search.cpan.org/search?dist=HTML-Parser


John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




extracting links from HTML data

2002-01-14 Thread Lorne Easton

I need to write some code that extracts that extracts hyperlinks from a
scalar ($data) and puts them into an array.

I imagine that grep can do this, but my mastery of it and
reqular expressions are not brilliant.

Can you please provide some example code, or at least point me in the right
direction?

Cheers,
Lorne



-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]