I didn't think scrappy had the ability to run remote ajax, similar to casperjs/phantom/nodejs...
Does scrappy run a headless browser process to accomplish this?? thanks On Mon, Apr 28, 2014 at 10:17 AM, Bill Ebeling <[email protected]> wrote: > Hey Mitch, > > At the risk of stating the obvious, Scrapy handles dynamic content quite > well. The general approach is to scrape the page, submit requests for the > ajax, stich the item together, submit it to the pipeline. > > That said, it's not complicated, but not trivial, either. > > To your specific point, the solution is either to regex it out, or to start > fiddling with the underlying html. I would not personally download someone > else's page and then put it on a server, since the js is still going to be > running and logging things and all that. > > If you want to look into writing a crawler that gets the dynamic content, > start here: http://doc.scrapy.org/en/latest/topics/request-response.html and > pay special attention to the meta dict. > > If you want more help with the specific site, provide a link so we can see > it. > > Hope that helps. > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
