Hello All, 1. I want to extract count of "minutes", "hours" text occurred (i.e. number of times text "minutes" & "hours" occurred) in this url -> http://steamcommunity.com/forum/*3703047*/trading/render/-1/? *start=0&count=5* <http://steamcommunity.com/forum/3703047/trading/render/-1/?start=151&count=50> *0* *(In this URL id is 3703047).*
2. After this, in above url I should increase start to 51 by keeping count as 50.(New url is http://steamcommunity.com/forum/*3703047* /trading/render/-1/?*start=51&count=5* <http://steamcommunity.com/forum/3703047/trading/render/-1/?start=151&count=50> *0*) Again I have to get count of those two words in this second url. 3. Now the problem is I should add count obtained in step 1 and step 2 then I should place total count in *item["total_count"] *and return this item in scrapy. 4. Again process repeats, for another id *3381077* as shown in start_urls below. *My ultimate goal is I have add count value if id in start_url is same. Then I should return that total count of that id as **item["total_count"].* In this program I wrote like it will return count individually for each start_url. But I should add count if id's in start_url are same then I should return the sum. Please help me regarding this. class MySpider(BaseSpider): name = "TotalCount" allowed_domains = ["steamcommunity.com"] start_urls = ["http://steamcommunity.com/forum/*3703047* /Trading/render/-1/?*start=0&count=50*", "http://steamcommunity.com/forum/*3703047* /Trading/render/-1/?*start=51&count=50*" "http://steamcommunity.com/forum/*3381077* /Trading/render/-1/?*start=0&count=50*", "http://steamcommunity.com/forum/*3381077* /Trading/render/-1/?*start=51&count=50*" ] def parse(self, response): jsonresponse = json.loads(response.body_as_unicode()) items = [] item = TodaydiscussionsItem() topics_html = jsonresponse["topics_html"] # extracts data in JSON attribute topics_html AppId = str(response.url) item["Id"] = str(AppId.split("/")[4]) # placing start_url's id in item["Id"] # Find number of times 'minutes, hour, hours' text occurred in JSON data count_minutes = re.findall("minutes ago", topics_html) count_hour = re.findall("hour", topics_html) item["total_count"] = len(count_minutes)+len(count_hour) items.append(item) return items I can also change URL like "http://steamcommunity.com/forum/*3703047* /Trading/render/-1/?start=0&count=100" but in this case I am not getting exact count of those 2 words from that JSON file. So please don't suggest to change 2 urls as above single url. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
