Re: Deep Crawl for a single domain

Nicolás Alejandro Ramírez Quiros Thu, 18 Dec 2014 13:32:29 -0800

Fix the domain

allowed_domains = ["dir.uk4net.com"]


El jueves, 18 de diciembre de 2014 14:08:06 UTC-2, Valentino Hudhra 
escribió:
>
> Hi, 
>
> I want to get every page under http://dir.uk4net.com/. 
>
> Here is my code :
>
> class Uk4NetSpider(CrawlSpider):
>     name = "uk4net"
>     allowed_domains = ["http://dir.uk4net.com/";]
>     start_urls = [ "http://dir.uk4net.com/";]
>     rules = (
>         Rule(LxmlLinkExtractor(allow=()), callback="parse_items"),
>     )
>
>     def parse_item(self, response):
>         ...
>
>
> For some reason, only links from other domains are extracted in the start 
> url. Does this have to do with the fact that all internal Urls are 
> relative? If so, how can I capture this?
>
> Thanks,
>
> Valentino 
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Deep Crawl for a single domain

Reply via email to