Thanks for bringing this up. I answered in SO. As a methodology - I would say -
try to make the simplest working thing possible and then build up towards the
more complex code you have. See at which point it breaks. Is it when you add an
API call? Is it when you return something? What I did was to replace your
queue() with this and it seems to work:
def queue(self):
return 'http://www.example.com/?{}'.format(random.randint(0,100000))
What can we infer from this?
From: [email protected]
Date: Thu, 16 Jun 2016 13:43:28 -0400
Subject: Trying to read from message queue, not parsing response in
make_requests_from_url loop
To: [email protected]
I have this question on SO, but no answers unfortunately. Figured Id try my
luck here.
https://stackoverflow.com/questions/37770678/scrapy-not-parsing-response-in-make-requests-from-url-loop
I'm trying to get scrapy to grab a URL from a message queue, and then scrape
that URL. I have the loop going just fine and grabbing the URL from the queue,
but it never enters the parse() method once it has a url, it just continues to
loop (and sometimes the url comes back around even though I've deleted it from
the queue...)While it's running in terminal, if I CTRL+C and force it to end,
it enters the parse() method and crawls the page, then ends. I'm not sure
what's wrong here. Scrapy needs to be running at all times to catch a url as it
enters the queue. Anyone have ideas or have done something like this?
class my_Spider(Spider):
name = "my_spider"
allowed_domains = ['domain.com']
def __init__(self):
super(my_Spider, self).__init__()
self.url = None
def start_requests(self):
while True:
# Crawl the url from queue
yield self.make_requests_from_url(self._pop_queue())
def _pop_queue(self):
# Grab the url from queue
return self.queue()
def queue(self):
url = None
while url is None:
conf = {
"sqs-access-key": "",
"sqs-secret-key": "",
"sqs-queue-name": "crawler",
"sqs-region": "us-east-1",
"sqs-path": "sqssend"
}
# Connect to AWS
conn = boto.sqs.connect_to_region(
conf.get('sqs-region'),
aws_access_key_id=conf.get('sqs-access-key'),
aws_secret_access_key=conf.get('sqs-secret-key')
)
q = conn.get_queue(conf.get('sqs-queue-name'))
message = conn.receive_message(q)
# Didn't get a message back, wait.
if not message:
time.sleep(10)
url = None
else:
url = message
if url is not None:
message = url[0]
message_body = str(message.get_body())
message.delete()
self.url = message_body
return self.url
def parse(self, response):
...
yield item
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.