Thank you for the help. I think you are right on kat.cr blocking my sever. I switched to another server and I was able to crawl the site just fine.
I looked in the documentation and I think the correct way to do it is to modify “USER_AGENT” in the settings.py file to something like the following: USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2227.0 Safari/537.36" is that the correct way to do it? kat is still blocking me with that in place. Are there any other setting fields I need to add/change to modify my crawler signature? Thanks again. > On Sep 18, 2015, at 12:31 PM, Travis Leleu <[email protected]> wrote: > > Most likely they are blocking your User-Agent (or possibly IP). This is a > basic anti-scraping measure, and easily avoidable by altering your scrapy UA. > > On Fri, Sep 18, 2015 at 11:44 AM, Ricky Huang <[email protected] > <mailto:[email protected]>> wrote: > Hello all, > > I am building a scraper for Kickass Torrents (kat.cr <http://kat.cr/>) for > scrapping torrent information and etc. I tested it via the shell interface > and Scrapy keeps erring out: > > >>> fetch("https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html > >>> > >>> <https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html>") > Traceback (most recent call last): > File "<console>", line 1, in <module> > File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 90, in > fetch > reactor, self._schedule, request, spider) > File "/usr/local/lib/python2.7/site-packages/twisted/internet/threads.py", > line 122, in blockingCallFromThread > result.raiseException() > File "<string>", line 2, in raiseException > TCPTimedOutError: TCP connection timed out: 60: Operation timed out. > > However, I am able to browse the site via a web browser, so it's definitely > not the site's fault. > > Can anyone shed a light on this issue for me? > > > Thanks in advance. > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at http://groups.google.com/group/scrapy-users > <http://groups.google.com/group/scrapy-users>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "scrapy-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe > <https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at http://groups.google.com/group/scrapy-users > <http://groups.google.com/group/scrapy-users>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
