Hi:
sorry that I'm not really familiar about scrapy. but I had to use scrapyJs
to get rendered contents.
I noticed that you have scrapySpider example but I want to use crawlSpider.
So I wrote this:
class JhsSpider(CrawlSpider):
name = "jhsspy"
allowd_domains=["taobao.com"]
start_urls = ["https://ju.taobao.com/"]
rules = [
Rule(SgmlLinkExtractor(allow =
(r'https://detail.ju.taobao.com/.*')), follow = False),
Rule(SgmlLinkExtractor(allow =
(r'https://detail.tmall.com/item.htm.*')), callback = "parse_link"),
]
def parse_link(self, response):
le = SgmlLinkExtractor()
for link in le.extract_links(response):
yield scrapy.Request(link.url, self.parse_item, meta={
'splash':{
'endpoint':'render.html',
'args':{
'wait':0.5,
}
}
})
def parse_item(self, response):
...get items with reponse...
but I had some problem that I'm not sure what caused them. So, want to know
is it the right way to yield request like what I did above.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.