Re: Roundabout way of scraping dynamic content.

bruce Mon, 28 Apr 2014 10:02:04 -0700

I didn't think scrappy had the ability to run remote ajax, similar to
casperjs/phantom/nodejs...


Does scrappy run a headless browser process to accomplish this??

thanks


On Mon, Apr 28, 2014 at 10:17 AM, Bill Ebeling <[email protected]> wrote:
> Hey Mitch,
>
> At the risk of stating the obvious, Scrapy handles dynamic content quite
> well.  The general approach is to scrape the page, submit requests for the
> ajax, stich the item together, submit it to the pipeline.
>
> That said, it's not complicated, but not trivial, either.
>
> To your specific point, the solution is either to regex it out, or to start
> fiddling with the underlying html.  I would not personally download someone
> else's page and then put it on a server, since the js is still going to be
> running and logging things and all that.
>
> If you want to look into writing a crawler that gets the dynamic content,
> start here: http://doc.scrapy.org/en/latest/topics/request-response.html and
> pay special attention to the meta dict.
>
> If you want more help with the specific site, provide a link so we can see
> it.
>
> Hope that helps.
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Roundabout way of scraping dynamic content.

Reply via email to