I haven't used SgmlLinkExtractor before, but i think you should use http://steamcommunity.com/workshop/browse/?appid=570§ion=mtxitems as the start url, try procss_links callback in the Rule() function to filter urls for the top 5 items.
在 2014-9-29,下午3:06,Chetan Motamarri <[email protected]> 写道: > Hi, > > I am new to use crawlspider... > > My problem is, I need to extract top 5 items data in this link > (http://steamcommunity.com/workshop/browse/?appid=570§ion=mtxitems). I > have done this like this: > > start_urls = [ > 'http://steamcommunity.com/sharedfiles/filedetails/?id=317972390&searchtext=' > ] > > and specified rules as > rules = ( > > Rule(SgmlLinkExtractor(allow=("http://steamcommunity.com/sharedfiles/filedetails/",)), > callback='parse_items'), > ) > > Now it is crawling through all urls that starts with > "http://steamcommunity.com/sharedfiles/filedetails" on the start_url page. > > My problem is it should crawl through only first 5 urls that starts with > "http://steamcommunity.com/sharedfiles/filedetails/" on the start_url page. > Can we do this by crawlspider restrict or any other means ? > > My code: > > class ScrapePriceSpider(CrawlSpider): > > name = 'ScrapeItems' > allowed_domains = ['steamcommunity.com'] > start_urls = > ['http://steamcommunity.com/sharedfiles/filedetails/?id=317972390&searchtext=' > ] > > rules = ( > > Rule(SgmlLinkExtractor(allow=("http://steamcommunity.com/sharedfiles/filedetails/",)), > callback='parse_items'), > ) > > > def parse_items(self, response): > hxs = HtmlXPathSelector(response) > > item = ExtractitemsItem() > > item["Item Name"] = > hxs.select("//div[@class='workshopItemTitle']/text()").extract() > item["Unique Visits"] = > hxs.select("//table[@class='stats_table']/tr[1]/td[1]/text()").extract() > item["Current Favorites"] = > hxs.select("//table[@class='stats_table']/tr[2]/td[1]/text()").extract() > return item > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
