Can you show the actual scrapy full debug output? Best bet is with pastebin or the like.
On Mon, Dec 1, 2014 at 6:41 PM, Chetan Motamarri <[email protected]> wrote: > Hi Travis, > > Thanks for your reply. > The problem I am facing when page 2 URL is given in start_urls is, spider > is not extracting games in page 2. Instead it is extracting games in page 1 > again. > > *Output when I gave only page 1 in **start_url**: * > > 335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440 > > *Output when I gave both page 1 and page 2 in start_url: * > 335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440 > > > 335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440 > > Same output is coming. > > On Monday, December 1, 2014 6:12:08 PM UTC-7, Travis Leleu wrote: >> >> Hi Chetan, >> >> What happens when you only have the URL for page 2 in your start_urls? >> That page seems to load fine without javascript, so I'm not convinced you >> need any sort of ajax support. >> >> Please provide the output you expect from the running script, and the >> actual output -- that will help evaluate whether the bug is in your >> understanding of scrapy's internals (something that happens a lot to me! >> It's a confusing piece of software at times because there is so much going >> on...) or if something else is occurring. >> >> Cheers, >> Travis >> >> On Mon, Dec 1, 2014 at 5:07 PM, Chetan Motamarri <[email protected]> wrote: >> >>> Hi All >>> >>> I need to extract *id's of games* in "http://store.steampowered. >>> com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC". >>> >>> The point is, I was able to extract game id's in first page. I don't >>> have any idea on how to move to next page and extract ids in those pages. >>> My code is: >>> >>> class ScrapePriceSpider(BaseSpider): >>> >>> name = 'UpdateGames' >>> allowed_domains = ['http://store.steampowered.com'] >>> start_urls = >>> ['*http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC& >>> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&>page=1'* >>> ] >>> >>> def parse(self, response): >>> hxs = Selector(response) >>> >>> path = hxs.xpath(".//div[@id='search_result_container']") >>> item = ItemscountItem() >>> >>> for ids in path: >>> gameIds = pack.xpath('.//a/@data-ds-appid').extract() # >>> extracting all game ids >>> >>> item["GameID"] = str(gameIds) >>> return item >>> >>> Like this *my goal is to extract all game ids in 353 pages given there. >>> *I think Ajax is used for pagination. I was not able to extract game >>> ids from 2nd page onwards. I tried giving >>> *"http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC& >>> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&>page=2*" >>> is given in start_urls but no use. >>> >>> >>> Please help me with this. >>> >>> >>> Thanks >>> Chetan Motamarri >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
