Fwd: Parsing web pages

2017-03-02 Thread kavita kulkarni
Hello,

Can you suggest some effective ways to parse multiple web pages from the
web site.
I cannot use web crawling as the format of the pages is not same. I am
interested in the data from specific table on each page.

Thanks in advance.
Kavita


Re: Fwd: Parsing web pages

2017-03-03 Thread Lars Noodén
On 03/03/2017 02:15 AM, kavita kulkarni wrote:
> Hello,
> 
> Can you suggest some effective ways to parse multiple web pages from the
> web site.
> I cannot use web crawling as the format of the pages is not same. I am
> interested in the data from specific table on each page.
> 
> Thanks in advance.
> Kavita
> 

Once you have acquired the page using either WWW:Mechanize, LWP, or even
just wget you can extract the table.

The modules HTML::TreeBuilder and HTML::TreeBuilder::XPath do extraction
rather easily if there is some consistent way to identify the table.

Regards,
Lars

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/




Re: Fwd: Parsing web pages

2017-03-03 Thread kavita kulkarni
Thanks all for your suggestions.
Will take a look at modules and see which one works for me.

Regards,
Kavita :-)

On Fri, Mar 3, 2017 at 12:39 AM, Lars Noodén  wrote:

> On 03/03/2017 02:15 AM, kavita kulkarni wrote:
> > Hello,
> >
> > Can you suggest some effective ways to parse multiple web pages from the
> > web site.
> > I cannot use web crawling as the format of the pages is not same. I am
> > interested in the data from specific table on each page.
> >
> > Thanks in advance.
> > Kavita
> >
>
> Once you have acquired the page using either WWW:Mechanize, LWP, or even
> just wget you can extract the table.
>
> The modules HTML::TreeBuilder and HTML::TreeBuilder::XPath do extraction
> rather easily if there is some consistent way to identify the table.
>
> Regards,
> Lars
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>


Re: Fwd: Parsing web pages

2017-03-03 Thread Dave Gray
The submodules WWW::Mechanize::Firefox or WWW::Mechanize::PhantomJS are
worth a look too, depending on the complexity/js-heaviness of the pages
you're parsing and what your setup looks like exactly (full headless; on
your computer, etc).

On Fri, Mar 3, 2017 at 1:39 AM, Lars Noodén  wrote:

> On 03/03/2017 02:15 AM, kavita kulkarni wrote:
> > Hello,
> >
> > Can you suggest some effective ways to parse multiple web pages from the
> > web site.
> > I cannot use web crawling as the format of the pages is not same. I am
> > interested in the data from specific table on each page.
> >
> > Thanks in advance.
> > Kavita
> >
>
> Once you have acquired the page using either WWW:Mechanize, LWP, or even
> just wget you can extract the table.
>
> The modules HTML::TreeBuilder and HTML::TreeBuilder::XPath do extraction
> rather easily if there is some consistent way to identify the table.
>
> Regards,
> Lars
>
> --
> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
> For additional commands, e-mail: beginners-h...@perl.org
> http://learn.perl.org/
>
>
>