Crawl spider is not ment to be used that way. The start_url is getting 
crawled with callback parse, then on that response you apply the rule 
looking for the same url what have no sense. Use scrapy.spider.Spider class 
instead and the parse method for the item, also from allowed domains you 
should want to remove the www.

El viernes, 22 de agosto de 2014 19:22:26 UTC-3, Hang Li escribió:
>
> Hello I am trying out Scrapy. But for one domain, ShoeScribe, the 
> parse_item is not called for. With same code, it works fine with other 
> domain. Totally no idea why. Any help will be really appreciated!
>
> import scrapy
> from scrapy import log
> from scrapy.contrib.spiders import CrawlSpider, Rule
> from scrapy.contrib.linkextractors import LinkExtractor
> from lsspider.items import *
>
> class ShoeScribeSpider(CrawlSpider):
>     name = "shoescribe"
>     merchant_name = "shoescribe.com"
>     allowed_domains = ["www.shoescribe.com"]
>
>     start_urls = [
>         "http://www.shoescribe.com/us/women/ankle-boots_cod44709699mx.html
> ",
>     ]
>
>     rules = (
>         Rule(LinkExtractor(allow=('
> http://www.shoescribe.com/us/women/ankle-boots_cod44709699mx.html')), 
> callback='parse_item', follow=True),
>     )
>
>     def parse_item(self, response):
>         print 'parse_item'
>
>         item = Item()
>         item['url'] = response.url.split('?')[0]
>
>         print item['url']
>         return item
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to