Re: i want to extract final page data.

masroor javed Thu, 10 Apr 2014 21:41:27 -0700

please any one help me how to solve this???????????

On Wednesday, April 9, 2014 5:17:27 PM UTC+5:30, masroor javed wrote:
>
> Hi,
> I want to extract final page data like titlename and description.
> I made a spider code for cragslist website.
> every page has 100 links and i got nexpage url with all links but i want 
> to go every link and scrap the data from final page.
> can any one help me to sort out my problem
> my spider code is:
>
>
> from scrapy.contrib.spiders import CrawlSpider, Rule
> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
> from scrapy.selector import HtmlXPathSelector
> from scrapy.http import Request
> from pagination.items import PaginationItem
> class InfojobsSpider(CrawlSpider):
> name = "check"
> allowed_domains = ["craigslist.org"]
> start_urls = [
> "http://sfbay.craigslist.org/npo/";
> ]
> rules = (
> Rule(SgmlLinkExtractor(allow=(r'index.*?html'),restrict_xpaths=('//a[@class="button
>  
> next"]')), callback='parse_item', follow=True),
> )
> fname = 1
> def parse_start_url(self, response):
> return self.parse_item(response)
>
> def parse_item(self, response):
> hxs = HtmlXPathSelector(response)
> titles = hxs.select('//span[@class="pl"]')
> for title in titles:
> title= title.select('a/@href').extract()
>
> NOTE:
> in title there is the link url comming now i want to go every url one by 
> one and extract data from finalpage.
> Please  help me.
>
>
>
>
>
>
>


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: i want to extract final page data.

Reply via email to