Hi :

     I 'm new to scrapy. 
     These days, I tried to use scrapyJs to get items rendered by 
Javascript.
     I'm not really familiar about scrapy or scrapyJs.
     I found my spider crawled really slow and have duplicate result.


    this is how I write my spider

# -*- coding: utf-8 -*-
import scrapy
from scrapy.selector import Selector
from scrapy.contrib.spiders import CrawlSpider,Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from jhsspy.items import JhsspyItem


class JhsSpider(CrawlSpider):
    name = "jhsspy"
    allowd_domains=["taobao.com"]
#    start_urls = ["http://sale.yohobuy.com/?specialsale_id=7&gender=1,3";]
    start_urls = ["https://ju.taobao.com/";]
    rules = [
            Rule(SgmlLinkExtractor(allow = 
(r'https://detail.ju.taobao.com/.*')), follow = False),

            Rule(SgmlLinkExtractor(allow = 
(r'https://detail.tmall.com/item.htm.*')), callback = "parse_link"),
            ]

 def parse_link(self, response):
        le = SgmlLinkExtractor()
        for link in le.extract_links(response):
            yield scrapy.Request(link.url, self.parse_item, meta={
                'splash':{
                    'endpoint':'render.html',
                    'args':{

                        'wait':0.5,


                        }
                    }
                })


    def parse_item(self, response):
        sel = Selector(response)
        item = JhsspyItem()
        ...get items ......






I want to know is it the right way to yield request in the rule's callback

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to