Hi Travis, Thanks for your reply. The problem I am facing when page 2 URL is given in start_urls is, spider is not extracting games in page 2. Instead it is extracting games in page 1 again.
*Output when I gave only page 1 in **start_url**: * 335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440 *Output when I gave both page 1 and page 2 in start_url: * 335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440 335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440 Same output is coming. On Monday, December 1, 2014 6:12:08 PM UTC-7, Travis Leleu wrote: > > Hi Chetan, > > What happens when you only have the URL for page 2 in your start_urls? > That page seems to load fine without javascript, so I'm not convinced you > need any sort of ajax support. > > Please provide the output you expect from the running script, and the > actual output -- that will help evaluate whether the bug is in your > understanding of scrapy's internals (something that happens a lot to me! > It's a confusing piece of software at times because there is so much going > on...) or if something else is occurring. > > Cheers, > Travis > > On Mon, Dec 1, 2014 at 5:07 PM, Chetan Motamarri <[email protected] > <javascript:>> wrote: > >> Hi All >> >> I need to extract *id's of games* in " >> http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC >> ". >> >> The point is, I was able to extract game id's in first page. I don't have >> any idea on how to move to next page and extract ids in those pages. My >> code is: >> >> class ScrapePriceSpider(BaseSpider): >> >> name = 'UpdateGames' >> allowed_domains = ['http://store.steampowered.com'] >> start_urls = >> ['*http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC& >> >> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&>page=1'* >> ] >> >> def parse(self, response): >> hxs = Selector(response) >> >> path = hxs.xpath(".//div[@id='search_result_container']") >> item = ItemscountItem() >> >> for ids in path: >> gameIds = pack.xpath('.//a/@data-ds-appid').extract() # >> extracting all game ids >> >> item["GameID"] = str(gameIds) >> return item >> >> Like this *my goal is to extract all game ids in 353 pages given there. *I >> think Ajax is used for pagination. I was not able to extract game ids from >> 2nd page onwards. I tried giving >> *"http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC& >> >> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&>page=2*" >> >> is given in start_urls but no use. >> >> >> Please help me with this. >> >> >> Thanks >> Chetan Motamarri >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
