OR On Fri, May 3, 2013 at 6:53 PM, timothy adigun <2teezp...@gmail.com> wrote:
> > Hi, > Please check my reply below. > > On Fri, May 3, 2013 at 12:59 PM, Edward and Erica Heim < > edh...@bigpond.net.au> wrote: > >> Hi all, >> >> I'm using LWP::UserAgent to access a website. One of the methods returns >> HTML data e.g. >> >> my $data = $response->content; >> >> I.e. $data contains the HTML content. I want to be able to parse it line >> by line e.g. >> >> foreach (split /pattern/, $data) { >> my $line = $_; >> ..... >> >> If I print $data, I can see the individual lines of the HTML data but I'm >> not clear on the "pattern" that I should use in split or if there is a >> better way to do this. >> >> What really are you splitting? And what exactly is the pattern you > are using? > > >> I understand that there are packages to parse HTML code but this is also >> a learning exercise for me. >> > > Please, don't parse HTML files with regexp. It's not that it can't be > done or it hasn't been done, but it labor in futility. Rather learn > modules like HTML::TreeBuilder and and rest from CPAN that can help do what > you wanted. > > Secondly, parse the file first before "splitting". > > If I may, say one is to parse http://www.perl.org to print out the > trimmed text on that web page. One can do like so: > > [CODE] > > #!/usr/bin/perl > use warnings; > use strict; > use LWP::UserAgent; > use HTML::TreeBuilder 5 -weak; > > ## url to get > my $url = 'http://www.perl.org'; > > ## get the file > my $ua = LWP::UserAgent->new; > my $resp = $ua->request( HTTP::Request->new( GET => $url ) ); > > ## parse the HTML file > my $tree = HTML::TreeBuilder->new; > $tree->parse( $resp->decoded_content ); > print $tree->as_trimmed_text; > > [/CODE] > Hope this help somehow. > >> >> you can do all of this in just few lines: [CODE] use warnings; use strict; use HTML::TreeBuilder 5 -weak; my $url = 'http://www.perl.org'; my $tree = HTML::TreeBuilder->new_from_url($url); print $tree->as_text; [/CODE] Seriously, use HTML parsing modules. Please check either https://metacpan.org/ or http://www.cpan.org/ If you are on *nux system, "lynx" may come in handy like so: lynx -dump http://www.perl.org, then the output can be put into per, then use as one likes. Hope this helps. > Thanks in advance, Edward >> >> >> >> >> >> -- >> To unsubscribe, e-mail: beginners-unsubscr...@perl.org >> For additional commands, e-mail: beginners-h...@perl.org >> http://learn.perl.org/ >> >> >> > > > -- > Tim > -- Tim