Fix the domain allowed_domains = ["dir.uk4net.com"]
El jueves, 18 de diciembre de 2014 14:08:06 UTC-2, Valentino Hudhra escribió: > > Hi, > > I want to get every page under http://dir.uk4net.com/. > > Here is my code : > > class Uk4NetSpider(CrawlSpider): > name = "uk4net" > allowed_domains = ["http://dir.uk4net.com/"] > start_urls = [ "http://dir.uk4net.com/"] > rules = ( > Rule(LxmlLinkExtractor(allow=()), callback="parse_items"), > ) > > def parse_item(self, response): > ... > > > For some reason, only links from other domains are extracted in the start > url. Does this have to do with the fact that all internal Urls are > relative? If so, how can I capture this? > > Thanks, > > Valentino > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
