Sure, Below you will find a crawl of http://doc.scrapy.org with a depth of 1 and extraction of inlinks only.
http://pastebin.com/wE292pQe As you can see from the stats the status 200 count is only 13. This is not the case if I put my agent-list directly in my module or if I disable my middleware. Thanks Den fredagen den 21:e mars 2014 kl. 10:55:26 UTC+1 skrev Paul Tremberth: > > Can you share logs? > > On Fri, Mar 21, 2014 at 10:53 AM, James Ford > <[email protected]<javascript:>> > wrote: > > Hello, > > > > I'm having an odd issue with one of my projects. > > > > I have implemented a custom middleware that rotates user-agent for each > > request. > > > > The middleware works by reading from a file when the middleware is > > initialized by putting the contents of the file into a list(in memory). > > > > According to me this should work fine, but I am getting a large amount > of > > 400 bad requsts of my crawls? The odd thing is that it works fine if I > just > > put the agents in a list directly instead of reading from file. > > > > What can cause this error? Here is my middleware: > > > > class UserAgentPool(): > > def __init__(self): > > basepath = os.path.dirname(__file__) > > filepath = os.path.abspath(os.path.join(basepath, "agents.txt")) > > with open(filepath, 'r') as f: > > self.agents = f.readlines() > > > > def rotate(self): > > log.msg("Rotating user agent", level=log.DEBUG) > > agent = self.agents.pop(0) > > log.msg("Agent popped %s" %agent, level=log.DEBUG) > > log.msg("[%s]" % ", ".join(map(str, self.agents)), > level=log.DEBUG) > > self.agents.append(agent) > > return agent > > > > class UserAgentRotationMiddleware(object): > > def __init__(self): > > self.pool = UserAgentPool() > > > > def process_request(self, request, spider): > > if getattr(spider, 'agent_rotation', None): > > agent = self.pool.rotate() > > request.headers.setdefault('User-Agent', agent) > > log.msg("Setting User-Agent to %s" > > %request.headers["User-Agent"]) > > > > > > -- > > You received this message because you are subscribed to the Google > Groups > > "scrapy-users" group. > > To unsubscribe from this group and stop receiving emails from it, send > an > > email to [email protected] <javascript:>. > > To post to this group, send email to > > [email protected]<javascript:>. > > > Visit this group at http://groups.google.com/group/scrapy-users. > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
