I'm trying to make a spider that crawls through a website, eventually multiple
websites, and lets me know if the CSS includes any "@media" queries. If none
are included within the internal styling, I would like it to load the external
stylesheets so that I can loop through the sources and search them. Right now
I'm trying to save the responses in a list and loop through them all at once,
but I'm starting to think it's a bad approach. Would anyone mind steering me in
the right direction?
# -*- coding: utf-8 -*-
import scrapy
from scrapy.http import Request
class SsSpider(scrapy.Spider):
name = "ss"
allowed_domains = [*"**scrapy.org**"*] #Example domain
start_urls = (
*'**http://scrapy.org/**'*,
)
cssResponses = []
cssResponseCount = 0
def parse(self, response):
cssPaths = response.xpath("//link/@href[contains(., '.css')]").extract()
cssRequestCount = len(cssPaths)
for cssPath in cssPaths:
yield Request(cssPath, callback=self.saveCssResponse)
while cssRequestCount != self.cssResponseCount:
continue
#When all responses are received, loop through and determine if CSS is
responsive
def saveCssResponse(self, response):
self.cssResponses.append(response.body)
self.cssResponseCount += 1
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.