Thank you for the help.  I think you are right on kat.cr blocking my sever.  I 
switched to another server and I was able to crawl the site just fine.

I looked in the documentation and I think the correct way to do it is to modify 
“USER_AGENT” in the settings.py file to something like the following:

USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like 
Gecko) Chrome/41.0.2227.0 Safari/537.36"

is that the correct way to do it?  kat is still blocking me with that in place. 
 Are there any other setting fields I need to add/change to modify my crawler 
signature?


Thanks again.


> On Sep 18, 2015, at 12:31 PM, Travis Leleu <[email protected]> wrote:
> 
> Most likely they are blocking your User-Agent (or possibly IP).  This is a 
> basic anti-scraping measure, and easily avoidable by altering your scrapy UA.
> 
> On Fri, Sep 18, 2015 at 11:44 AM, Ricky Huang <[email protected] 
> <mailto:[email protected]>> wrote:
> Hello all,
> 
> I am building a scraper for Kickass Torrents (kat.cr <http://kat.cr/>) for 
> scrapping torrent information and etc.  I tested it via the shell interface 
> and Scrapy keeps erring out:
> 
> >>> fetch("https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html
> >>>  
> >>> <https://kat.cr/south-park-s19e01-720p-hdtv-x264-killers-rartv-t11271450.html>")
> Traceback (most recent call last):
>   File "<console>", line 1, in <module>
>   File "/usr/local/lib/python2.7/site-packages/scrapy/shell.py", line 90, in 
> fetch
>     reactor, self._schedule, request, spider)
>   File "/usr/local/lib/python2.7/site-packages/twisted/internet/threads.py", 
> line 122, in blockingCallFromThread
>     result.raiseException()
>   File "<string>", line 2, in raiseException
> TCPTimedOutError: TCP connection timed out: 60: Operation timed out.
> 
> However, I am able to browse the site via a web browser, so it's definitely 
> not the site's fault.
> 
> Can anyone shed a light on this issue for me?
> 
> 
> Thanks in advance.
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at http://groups.google.com/group/scrapy-users 
> <http://groups.google.com/group/scrapy-users>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "scrapy-users" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe 
> <https://groups.google.com/d/topic/scrapy-users/JUrw4qO-K8k/unsubscribe>.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected] 
> <mailto:[email protected]>.
> To post to this group, send email to [email protected] 
> <mailto:[email protected]>.
> Visit this group at http://groups.google.com/group/scrapy-users 
> <http://groups.google.com/group/scrapy-users>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to