Actually, I spoke too soon. Please ignore my last reply - turning the 
display permanently on made the crawl go LONGER than the last ones, but 
still resulted in 400 errors after a while. I don't think I'm being banned 
because immediately restarting the crawl works fine, until I eventually get 
400 errors again. I do get over 1000 crawls in before this happens.

On Friday, February 22, 2013 11:01:31 PM UTC-5, Pablo Hoffman wrote:
>
> It does look a lot like a banning issue, but perhaps you can store the 
> body of the bogus (400 code) responses to inspect it later, maybe they 
> provide more information into what's going wrong. To do that, you can use 
> request errbacks or search the doc for "handle_httpstatus_list".
>
> Pablo.
>
>
> On Fri, Feb 22, 2013 at 8:03 AM, Roman Kolpak <[email protected] 
> <javascript:>> wrote:
>
>> Hey, I know it's a bit outdated, but I have ran into the same issue in my 
>> spider and it's kinda tricky to debug.
>> Did you by any chance find a solution for the random 400 bad request 
>> issue?
>>
>> воскресенье, 15 июля 2012 г., 2:25:18 UTC+3 пользователь Trey написал:
>>
>>> I should also add that I have disabled cookies on the spider and that 
>>> has no effect either.
>>>
>>> On Saturday, 14 July 2012 16:21:55 UTC-7, Trey wrote:
>>>>
>>>> I've written a CrawlSpider to scrape a news website's archives and it 
>>>> will only get ~200 requests in before it starts receiving 400 Bad Request 
>>>> errors. I'm not getting IP banned or even useragent banned, as I can 
>>>> immediately shut the bot down, restart it and it will make it another 200 
>>>> requests or so. Further, I can immediately view the pages in an open 
>>>> browser without changing my IP. I have lowered the number of concurrent 
>>>> requests, put a 2 second delay between requests, nothing seems to help. Is 
>>>> there any way to circumvent this? The URLs are not malformed. 
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>>
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to