Re: Please let me know How to call and pass two value to anthor function in scrapy?

Svyatoslav Sydorenko Wed, 16 Apr 2014 14:35:18 -0700

Then just yield a new Request instead of returning url.

BWT, You also should avoid double loop. It's possible to extract all links 
with single XPath expression
//div[@class="listItemDetail exhibitorDetail"]/h3[@class="name"]/a/@href


P.S. If I understand you right you may also let scrapy crawl all links 
itself and not implement

Середа, 16 квітня 2014 р. 12:34:19 UTC+3 користувач masroor javed написав:
>
> Hi Svyatoslav i just want to return all the website name from 
> getwebsitename function to yield 
> Request(url=titleurls,callback=self.getwebsitename)
>
>
> On Wed, Apr 16, 2014 at 2:22 PM, Svyatoslav Sydorenko <
> [email protected] <javascript:>> wrote:
>
>>
>> - yield Request(url=titleurls,callback=self.getwebsitename)
>> + yield Request(url=titleurls, meta={"titlename": some_titlename, 
>> "standnumber": some_standnumber}, callback=self.getwebsitename)
>>
>> and in getwebsitename you may just access response.meta dict.
>>
>> http://doc.scrapy.org/en/latest/topics/request-response.html?highlight=meta#scrapy.http.Response.meta
>>
>> Вівторок, 15 квітня 2014 р. 14:14:32 UTC+3 користувач masroor javed 
>> написав:
>>
>>> Hi,
>>>
>>> I am new here in scrapy.
>>> I just want to know how to call a function and pass the two or three 
>>> value in return.
>>> I have a spider code please let me know how to solve it.
>>>
>>> Step:
>>> 1. i want to scrap all page links with pagination and and stand number.
>>> 2. hit all the links and want to extract website url
>>> 3. Total value should b 3 means titlename, standnumber and website url.
>>>
>>> my spider code is
>>>
>>> import re
>>> import sys
>>> import unicodedata
>>> from string import join
>>> from scrapy.contrib.spiders import CrawlSpider, Rule
>>> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
>>> from scrapy.selector import HtmlXPathSelector
>>> from scrapy.http import Request
>>> from pagitest.items import PagitestItem
>>> from urlparse import urlparse
>>> from urlparse import urljoin
>>> class InfojobsSpider(CrawlSpider):
>>> USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101 
>>> Firefox/29.0"
>>>  name = "info"
>>> allowed_domains = ["infosec.co.uk"]
>>>  start_urls = [
>>> "http://www.infosec.co.uk/exhibitor-directory/";
>>>  ]
>>> rules = (
>>> Rule(SgmlLinkExtractor(allow=(r'exhibitor\W+directory'),
>>> restrict_xpaths=('//li[@class="gButton"]/a')), callback='parse_item', 
>>> follow=True),
>>>  )
>>> def parse_item(self, response):
>>> items=[]
>>> hxs = HtmlXPathSelector(response)
>>>  data = hxs.select('//div[@class="listItemDetail exhibitorDetail"]')
>>> for titlename in data:
>>>  titleurl=titlename.select('h3[@class="name"]/a/@href').extract()
>>> for titleurls in titleurl:
>>>  preg=re.match('^http',titleurls)
>>> if preg:
>>> titleurls=titleurls
>>>  else:
>>> titleurls="http://www.infosec.co.uk"+titleurls
>>>  yield Request(url=titleurls,callback=self.getwebsitename)
>>>  
>>> def getwebsitename(self,response):
>>> hxs= HtmlXPathSelector(response)
>>> websites= hxs.select('//li[@class="web"]/a/@href').extract()
>>>  for websitename in websites:
>>> return websites
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected]<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Please let me know How to call and pass two value to anthor function in scrapy?

Reply via email to