Hi:
sorry that I'm not really familiar about scrapy. but I had to use scrapyJs 
to get rendered contents.
I noticed that you have scrapySpider example but I want to use crawlSpider. 
So I wrote this:


class JhsSpider(CrawlSpider):
    name = "jhsspy"
    allowd_domains=["taobao.com"]
    start_urls = ["https://ju.taobao.com/";]
    rules = [
            Rule(SgmlLinkExtractor(allow = 
(r'https://detail.ju.taobao.com/.*')), follow = False),

            Rule(SgmlLinkExtractor(allow = 
(r'https://detail.tmall.com/item.htm.*')), callback = "parse_link"),
        ]
def parse_link(self, response):
    le = SgmlLinkExtractor()
    for link in le.extract_links(response):
        yield scrapy.Request(link.url, self.parse_item, meta={
            'splash':{
                'endpoint':'render.html',
                'args':{
                    'wait':0.5,
                    }
                }
            })

def parse_item(self, response):
    ...get items with reponse...

 

but I had some problem that I'm not sure what caused them. So, want to know 
is it the right way to yield request like what I did above.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to