Hi, I am new to use crawlspider...
My problem is, *I need to extract top 5 items data in this link* (http://steamcommunity.com/workshop/browse/?appid=570§ion=mtxitems). I have done this like this: start_urls = [ ' *http://steamcommunity.com/sharedfiles/filedetails/?id=317972390&searchtext=*' ] and specified rules as rules = ( Rule(SgmlLinkExtractor(allow=(" *http://steamcommunity.com/sharedfiles/filedetails/*",)), callback='parse_items'), ) Now it is crawling through all urls that starts with "http://steamcommunity.com/sharedfiles/filedetails" on the start_url <http://steamcommunity.com/workshop/browse/?appid=570§ion=mtxitems> page. My problem is it should crawl through only first 5 urls that starts with "http://steamcommunity.com/sharedfiles/filedetails/" on the start_url <http://steamcommunity.com/workshop/browse/?appid=570§ion=mtxitems>page. Can we do this by crawlspider restrict or any other means ? *My code: * class ScrapePriceSpider(CrawlSpider): name = 'ScrapeItems' allowed_domains = ['steamcommunity.com'] start_urls = [' *http://steamcommunity.com/sharedfiles/filedetails/?id=317972390&searchtext=*' ] rules = ( Rule(SgmlLinkExtractor(allow=("http://steamcommunity.com/sharedfiles/filedetails/",)), callback='parse_items'), ) def parse_items(self, response): hxs = HtmlXPathSelector(response) item = ExtractitemsItem() item["Item Name"] = hxs.select("//div[@class='workshopItemTitle']/text()").extract() item["Unique Visits"] = hxs.select("//table[@class='stats_table']/tr[1]/td[1]/text()").extract() item["Current Favorites"] = hxs.select("//table[@class='stats_table']/tr[2]/td[1]/text()").extract() return item -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
