Re: Roundabout way of scraping dynamic content.

Bill Ebeling Mon, 28 Apr 2014 07:18:00 -0700

Hey Mitch,

At the risk of stating the obvious, Scrapy handles dynamic content quite 
well.  The general approach is to scrape the page, submit requests for the 
ajax, stich the item together, submit it to the pipeline.


That said, it's not complicated, but not trivial, either.

To your specific point, the solution is either to regex it out, or to start 
fiddling with the underlying html.  I would not personally download someone 
else's page and then put it on a server, since the js is still going to be 
running and logging things and all that.

If you want to look into writing a crawler that gets the dynamic content, 
start here: http://doc.scrapy.org/en/latest/topics/request-response.html 
and pay special attention to the meta dict.

If you want more help with the specific site, provide a link so we can see 
it.

Hope that helps.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Roundabout way of scraping dynamic content.

Reply via email to