Actually, I spoke too soon. Please ignore my last reply - turning the display permanently on made the crawl go LONGER than the last ones, but still resulted in 400 errors after a while. I don't think I'm being banned because immediately restarting the crawl works fine, until I eventually get 400 errors again. I do get over 1000 crawls in before this happens.
On Friday, February 22, 2013 11:01:31 PM UTC-5, Pablo Hoffman wrote: > > It does look a lot like a banning issue, but perhaps you can store the > body of the bogus (400 code) responses to inspect it later, maybe they > provide more information into what's going wrong. To do that, you can use > request errbacks or search the doc for "handle_httpstatus_list". > > Pablo. > > > On Fri, Feb 22, 2013 at 8:03 AM, Roman Kolpak <[email protected] > <javascript:>> wrote: > >> Hey, I know it's a bit outdated, but I have ran into the same issue in my >> spider and it's kinda tricky to debug. >> Did you by any chance find a solution for the random 400 bad request >> issue? >> >> воскресенье, 15 июля 2012 г., 2:25:18 UTC+3 пользователь Trey написал: >> >>> I should also add that I have disabled cookies on the spider and that >>> has no effect either. >>> >>> On Saturday, 14 July 2012 16:21:55 UTC-7, Trey wrote: >>>> >>>> I've written a CrawlSpider to scrape a news website's archives and it >>>> will only get ~200 requests in before it starts receiving 400 Bad Request >>>> errors. I'm not getting IP banned or even useragent banned, as I can >>>> immediately shut the bot down, restart it and it will make it another 200 >>>> requests or so. Further, I can immediately view the pages in an open >>>> browser without changing my IP. I have lowered the number of concurrent >>>> requests, put a 2 second delay between requests, nothing seems to help. Is >>>> there any way to circumvent this? The URLs are not malformed. >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users?hl=en. >> For more options, visit https://groups.google.com/groups/opt_out. >> >> >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
