You probably have an IP ban. Make your requests from a different IP address.
On Fri, Sep 18, 2015 at 4:14 PM, Ricky Huang <[email protected]> wrote: > Thank you for the help. I think you are right on kat.cr blocking my > sever. I switched to another server and I was able to crawl the site just > fine. > > I looked in the documentation and I think the correct way to do it is to > modify “USER_AGENT” in the settings.py file to something like the following: > > USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, > like Gecko) Chrome/41.0.2227.0 Safari/537.36" > > is that the correct way to do it? kat is still blocking me with that in > place. Are there any other setting fields I need to add/change to modify > my crawler signature? > > > Thanks again. > > > On Sep 18, 2015, at 12:31 PM, Travis Leleu <[email protected]> wrote: > > Most likely they are blocking your User-Agent (or possibly IP). This is a > basic anti-scraping measure, and easily avoidable by altering your scrapy > UA. > > On Fri, Sep 18, 2015 at 11:44 AM, Ricky Huang <[email protected]> > wrote: > >> Hello all, >> >> I am building a scraper for Kickass Torrents (kat.cr) for scrapping >> torrent information and etc. I tested it via the shell interface and >> Scrapy keeps erring out: >> >> >>> fetch(" >>> https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html >>> ") >>> Traceback (most recent call last): >>> File "<console>", line 1, in <module> >>> File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line >>> 90, in fetch >>> reactor, self._schedule, request, spider) >>> File >>> "/usr/local/lib/python2.7/site-packages/twisted/internet/threads.py", line >>> 122, in blockingCallFromThread >>> result.raiseException() >>> File "<string>", line 2, in raiseException >>> TCPTimedOutError: TCP connection timed out: 60: Operation timed out. >> >> >> However, I am able to browse the site via a web browser, so it's >> definitely not the site's fault. >> >> Can anyone shed a light on this issue for me? >> >> >> Thanks in advance. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > You received this message because you are subscribed to a topic in the > Google Groups "scrapy-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
