If I dont find "No Information" in response.body I want to write the good
urls to a file.
I am struggling to build the filter.
Also, maybe there is a better way of storing the good urls and then
crawling back through them once the raw_urls have been 'filtered'?
def start_requests(self):
raw_urls = generate_result_urls(self.YEAR, self.YEARS)
for url in raw_urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self, response):
#generate good urls
# gors to pages, if not info write url to file
f_ = 'goodurls.txt'
#look for
if b"No Information." not in response.body:
print(response.url)
yield
else:
#write response.url to file here
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.