Hey Bill. i found what I think to be articles discussing the nasa image/scrapy. Yeah, it's not really doing the headless browser at all.. It's "simulating" a piece of what the javascript returns from that given page.. But for a complex dnamic site, still doesn't do a "real" headless browser..
thanks On Mon, Apr 28, 2014 at 1:30 PM, bruce <[email protected]> wrote: > bill... > > not sure that's the same... ie, I don't think scrapy has a way to > "wait" for an element to show up on a given page, based on the > underlying ajax functions... > > I had talked to pablo about this awhile ago and he was saying scrapy > couldn't handle this. Are you saying it now can?? > > This would be cool if it really can. > > > On Mon, Apr 28, 2014 at 1:13 PM, Bill Ebeling <[email protected]> wrote: >> Scrapy sends a request to the ajax address just like it does for the normal >> webpage. You maintain data from one request to the other with the meta dict. >> >> There was a tutorial on it a while back about scraping the nasa website for >> it's pic of the day. Can't seem to find it, now though. If you take a look >> at the link above, you can read all about it. >> >> >> On Mon, Apr 28, 2014 at 1:01 PM, bruce <[email protected]> wrote: >>> >>> I didn't think scrappy had the ability to run remote ajax, similar to >>> casperjs/phantom/nodejs... >>> >>> Does scrappy run a headless browser process to accomplish this?? >>> >>> thanks >>> >>> >>> On Mon, Apr 28, 2014 at 10:17 AM, Bill Ebeling <[email protected]> >>> wrote: >>> > Hey Mitch, >>> > >>> > At the risk of stating the obvious, Scrapy handles dynamic content quite >>> > well. The general approach is to scrape the page, submit requests for >>> > the >>> > ajax, stich the item together, submit it to the pipeline. >>> > >>> > That said, it's not complicated, but not trivial, either. >>> > >>> > To your specific point, the solution is either to regex it out, or to >>> > start >>> > fiddling with the underlying html. I would not personally download >>> > someone >>> > else's page and then put it on a server, since the js is still going to >>> > be >>> > running and logging things and all that. >>> > >>> > If you want to look into writing a crawler that gets the dynamic >>> > content, >>> > start here: http://doc.scrapy.org/en/latest/topics/request-response.html >>> > and >>> > pay special attention to the meta dict. >>> > >>> > If you want more help with the specific site, provide a link so we can >>> > see >>> > it. >>> > >>> > Hope that helps. >>> > >>> > -- >>> > You received this message because you are subscribed to the Google >>> > Groups >>> > "scrapy-users" group. >>> > To unsubscribe from this group and stop receiving emails from it, send >>> > an >>> > email to [email protected]. >>> > To post to this group, send email to [email protected]. >>> > Visit this group at http://groups.google.com/group/scrapy-users. >>> > For more options, visit https://groups.google.com/d/optout. >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "scrapy-users" group. >>> To unsubscribe from this topic, visit >>> https://groups.google.com/d/topic/scrapy-users/LyCuWu4ydeA/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
