Hi Chetan, SteamCommunity.com appears to load the data via AJAX, rather than in the HTML source code. AFAIK, you can't use scrapy to do this (at least, not without serious modification).
In order to scrape Javascript loaded data, you'll need to use something like a headless javascript browser with python bindings. Some common ones are phantomJS and selenium. There is also a lightweight package that seems a bit simpler than rolling your own. I've never used it, so cannot speak to its quality, but the documentation appears decent. You might try this first. Check out http://dryscrape.readthedocs.org/en/latest/ for more information about it. -Travis On Thu, Oct 2, 2014 at 7:38 PM, Chetan Motamarri <[email protected]> wrote: > Hi, > > I need to extract start date & time of all discussions(in all pages) from > this url "http://steamcommunity.com/workshop/discussions/?appid=570" . I > tried in lot of ways but can't. > > Here discussions are changing dynamically.(i.e. when page 2 is clicked, I > was not able find 2nd page discussions in source code). > > My idea is if there is any source file of discussions (like .xml/.json) > then we can pull data directly from that src page. But I was not able to > find out location of source file of discussions. > > How to use scrapy here ? > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
