Re: split

timothy adigun Fri, 03 May 2013 11:16:15 -0700

OR
On Fri, May 3, 2013 at 6:53 PM, timothy adigun <2teezp...@gmail.com> wrote:


>
> Hi,
> Please check my reply below.
>
> On Fri, May 3, 2013 at 12:59 PM, Edward and Erica Heim <
> edh...@bigpond.net.au> wrote:
>
>> Hi all,
>>
>> I'm using  LWP::UserAgent to access a website. One of the methods returns
>> HTML data e.g.
>>
>> my $data = $response->content;
>>
>> I.e. $data contains the HTML content. I want to be able to parse it line
>> by line e.g.
>>
>> foreach (split /pattern/, $data) {
>>     my $line = $_;
>> .....
>>
>> If I print $data, I can see the individual lines of the HTML data but I'm
>> not clear on the "pattern" that I should use in split or if there is a
>> better way to do this.
>>
>>     What really are you splitting? And what exactly is the pattern you
> are using?
>
>
>> I understand that there are packages to parse HTML code but this is also
>> a learning exercise for me.
>>
>
>     Please, don't parse HTML files with regexp. It's not that it can't be
> done or it  hasn't been done, but it labor in futility. Rather learn
> modules like HTML::TreeBuilder and and rest from CPAN that can help do what
> you wanted.
>
> Secondly, parse the file first before "splitting".
>
>  If I may, say one is to parse http://www.perl.org to print out the
> trimmed text on that web page. One can do like so:
>
> [CODE]
>
> #!/usr/bin/perl
> use warnings;
> use strict;
> use LWP::UserAgent;
> use HTML::TreeBuilder 5 -weak;
>
> ## url to get
> my $url = 'http://www.perl.org';
>
> ## get the file
> my $ua = LWP::UserAgent->new;
> my $resp = $ua->request( HTTP::Request->new( GET => $url ) );
>
> ## parse the HTML file
> my $tree = HTML::TreeBuilder->new;
> $tree->parse( $resp->decoded_content );
> print $tree->as_trimmed_text;
>
> [/CODE]
> Hope this help somehow.
>
>>
>>
   you can do all of this in just few lines:
[CODE]
use warnings;
use strict;
use HTML::TreeBuilder 5 -weak;

my $url = 'http://www.perl.org';

my $tree = HTML::TreeBuilder->new_from_url($url);

print $tree->as_text;
[/CODE]

Seriously, use HTML parsing modules. Please check either
https://metacpan.org/ or http://www.cpan.org/

If you are on *nux system, "lynx" may come in handy like so:
lynx -dump http://www.perl.org, then the output can be put into per, then
use as one likes.
Hope this helps.

> Thanks in advance, Edward
>>
>>
>>
>>
>>
>> --
>> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
>> For additional commands, e-mail: beginners-h...@perl.org
>> http://learn.perl.org/
>>
>>
>>
>
>
> --
> Tim
>



-- 
Tim

Re: split

Reply via email to