help me please...

On Thu, Apr 17, 2014 at 10:20 AM, masroor javed <[email protected]>wrote:

> yes i know but these links will be extract with simple xpath expression.
> i just want to hit all these links and get the website name and again come
> back to first page to get the link name and stand name.
> Means:
> first page have 12 links so i have to extract each link name and stand
> name then i have to hit one by one link and get website name.
> titlename, standname and websitename.
> I have attached image in which i described the titlename,standname.
>
>
>
> On Thu, Apr 17, 2014 at 3:04 AM, Svyatoslav Sydorenko <
> [email protected]> wrote:
>
>> Then just yield a new Request instead of returning url.
>>
>> BWT, You also should avoid double loop. It's possible to extract all
>> links with single XPath expression
>> //div[@class="listItemDetail exhibitorDetail"]/h3[@class="name"]/a/@href
>>
>> P.S. If I understand you right you may also let scrapy crawl all links
>> itself and not implement
>>
>> Середа, 16 квітня 2014 р. 12:34:19 UTC+3 користувач masroor javed написав:
>>>
>>> Hi Svyatoslav i just want to return all the website name from
>>> getwebsitename function to yield Request(url=titleurls,
>>> callback=self.getwebsitename)
>>>
>>>
>>>  On Wed, Apr 16, 2014 at 2:22 PM, Svyatoslav Sydorenko <
>>> [email protected]> wrote:
>>>
>>>>
>>>> - yield Request(url=titleurls,callback=self.getwebsitename)
>>>> + yield Request(url=titleurls, meta={"titlename": some_titlename,
>>>> "standnumber": some_standnumber}, callback=self.getwebsitename)
>>>>
>>>> and in getwebsitename you may just access response.meta dict.
>>>> http://doc.scrapy.org/en/latest/topics/request-
>>>> response.html?highlight=meta#scrapy.http.Response.meta
>>>>
>>>> Вівторок, 15 квітня 2014 р. 14:14:32 UTC+3 користувач masroor javed
>>>> написав:
>>>>
>>>>> Hi,
>>>>>
>>>>> I am new here in scrapy.
>>>>> I just want to know how to call a function and pass the two or three
>>>>> value in return.
>>>>> I have a spider code please let me know how to solve it.
>>>>>
>>>>> Step:
>>>>> 1. i want to scrap all page links with pagination and and stand number.
>>>>> 2. hit all the links and want to extract website url
>>>>> 3. Total value should b 3 means titlename, standnumber and website url.
>>>>>
>>>>> my spider code is
>>>>>
>>>>> import re
>>>>> import sys
>>>>> import unicodedata
>>>>> from string import join
>>>>> from scrapy.contrib.spiders import CrawlSpider, Rule
>>>>> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
>>>>> from scrapy.selector import HtmlXPathSelector
>>>>> from scrapy.http import Request
>>>>> from pagitest.items import PagitestItem
>>>>> from urlparse import urlparse
>>>>> from urlparse import urljoin
>>>>> class InfojobsSpider(CrawlSpider):
>>>>> USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; rv:29.0) Gecko/20100101
>>>>> Firefox/29.0"
>>>>>  name = "info"
>>>>> allowed_domains = ["infosec.co.uk"]
>>>>>  start_urls = [
>>>>> "http://www.infosec.co.uk/exhibitor-directory/";
>>>>>  ]
>>>>> rules = (
>>>>> Rule(SgmlLinkExtractor(allow=(r'exhibitor\W+directory'),rest
>>>>> rict_xpaths=('//li[@class="gButton"]/a')), callback='parse_item',
>>>>> follow=True),
>>>>>  )
>>>>> def parse_item(self, response):
>>>>> items=[]
>>>>> hxs = HtmlXPathSelector(response)
>>>>>  data = hxs.select('//div[@class="listItemDetail exhibitorDetail"]')
>>>>> for titlename in data:
>>>>>  titleurl=titlename.select('h3[@class="name"]/a/@href').extract()
>>>>> for titleurls in titleurl:
>>>>>  preg=re.match('^http',titleurls)
>>>>> if preg:
>>>>> titleurls=titleurls
>>>>>  else:
>>>>> titleurls="http://www.infosec.co.uk"+titleurls
>>>>>  yield Request(url=titleurls,callback=self.getwebsitename)
>>>>>
>>>>> def getwebsitename(self,response):
>>>>> hxs= HtmlXPathSelector(response)
>>>>> websites= hxs.select('//li[@class="web"]/a/@href').extract()
>>>>>  for websitename in websites:
>>>>> return websites
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>>
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>> You received this message because you are subscribed to the Google Groups
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> To post to this group, send email to [email protected].
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to