Re: How to scrape Ajax content

Chetan Motamarri Mon, 01 Dec 2014 18:42:07 -0800

Hi  Travis,

Thanks for your reply.
The problem I am facing when page 2 URL is given in start_urls is, spider 
is not extracting games in page 2. Instead it is extracting games in page 1 
again.


*Output when I gave only page 1 in **start_url**: *
335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440

*Output when I gave both page 1 and  page 2 in start_url:  *
335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440
 
335410,333790,283080,334970,327490,333540,313680,314300,315396,314450,335210,292750,315080,334400,334401,334402,324331,335,70,321660,331910,318530,303830,328140,334080,279440

Same output is coming.

On Monday, December 1, 2014 6:12:08 PM UTC-7, Travis Leleu wrote:
>
> Hi Chetan,
>
> What happens when you only have the URL for page 2 in your start_urls?  
> That page seems to load fine without javascript, so I'm not convinced you 
> need any sort of ajax support.
>
> Please provide the output you expect from the running script, and the 
> actual output -- that will help evaluate whether the bug is in your 
> understanding of scrapy's internals (something that happens a lot to me! 
> It's a confusing piece of software at times because there is so much going 
> on...) or if something else is occurring.
>
> Cheers,
> Travis
>
> On Mon, Dec 1, 2014 at 5:07 PM, Chetan Motamarri <[email protected] 
> <javascript:>> wrote:
>
>> Hi All
>>
>> I need to extract *id's of games* in "
>> http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC
>> ". 
>>
>> The point is, I was able to extract game id's in first page. I don't have 
>> any idea on how to move to next page and extract ids in those pages. My 
>> code is:
>>
>> class ScrapePriceSpider(BaseSpider):
>>     
>>     name = 'UpdateGames'     
>>     allowed_domains = ['http://store.steampowered.com']    
>>     start_urls = 
>> ['*http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;
>>  
>> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;>page=1'*
>> ]
>>     
>>     def parse(self, response):
>>         hxs = Selector(response)
>>
>>         path = hxs.xpath(".//div[@id='search_result_container']") 
>>         item = ItemscountItem()
>>  
>>         for ids in path:
>>             gameIds = pack.xpath('.//a/@data-ds-appid').extract() # 
>> extracting all game ids
>>            
>>              item["GameID"] = str(gameIds)
>>              return item
>>
>> Like this *my goal is to extract all game ids in 353 pages given there. *I 
>> think Ajax is used for pagination. I was not able to extract game ids from 
>> 2nd page onwards. I tried giving 
>> *"http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;
>>  
>> <http://store.steampowered.com/search/?sort_by=Released_DESC&os=win#sort_by=Released_DESC&;>page=2*"
>>  
>> is given in start_urls but no use.
>>
>>
>> Please help me with this.
>>
>>
>> Thanks
>> Chetan Motamarri
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>     
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: How to scrape Ajax content

Reply via email to