Hello,

I'm having an odd issue with one of my projects.

I have implemented a custom middleware that rotates user-agent for each 
request.

The middleware works by reading from a file when the middleware is 
initialized by putting the contents of the file into a list(in memory).

According to me this should work fine, but I am getting a large amount of 
400 bad requsts of my crawls? The odd thing is that it works fine if I just 
put the agents in a list directly instead of reading from file.

What can cause this error? Here is my middleware:

class UserAgentPool():
    def __init__(self):
        basepath = os.path.dirname(__file__)
        filepath = os.path.abspath(os.path.join(basepath, "agents.txt"))
        with open(filepath, 'r') as f:
            self.agents = f.readlines()

    def rotate(self):
        log.msg("Rotating user agent", level=log.DEBUG)
        agent = self.agents.pop(0)
        log.msg("Agent popped %s" %agent, level=log.DEBUG)
        log.msg("[%s]" % ", ".join(map(str, self.agents)), level=log.DEBUG)
        self.agents.append(agent)
        return agent

class UserAgentRotationMiddleware(object):
    def __init__(self):
        self.pool = UserAgentPool()

    def process_request(self, request, spider):
        if getattr(spider, 'agent_rotation', None):
            agent = self.pool.rotate()
            request.headers.setdefault('User-Agent', agent)
            log.msg("Setting User-Agent to %s" 
%request.headers["User-Agent"])


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to scrapy-users+unsubscr...@googlegroups.com.
To post to this group, send email to scrapy-users@googlegroups.com.
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to