Hi, Try with allowed_domains = ["monsterboard.nl <http://www.monsterboard.nl/>"] Otherwise, I'm afraid you'll have many offsite requests filtered.
/Paul. On Thursday, January 16, 2014 5:00:22 PM UTC+1, pkforlol3 wrote: > > *This is my items.py:* > > from scrapy.item import Item, Field > > class Cvcrawler1(Item): > title = Field() > titel = Field() > strong = Field() > paragraaf = Field() > link = Field() > > *This is my crawlcsv.py:* > > from scrapy.spider import BaseSpider > from scrapy.selector import HtmlXPathSelector > from cvcrawler1.items import Cvcrawler1 > > class MySpider(BaseSpider): > name = "crawlcsv" > allowed_domains = ["http://www.monsterboard.nl/"] > f = open("url.txt") > start_urls = [url.strip() for url in f.readlines()] > f.close() > def parse(self, response): > hxs = HtmlXPathSelector(response) > titles = hxs.select("//div[@class='body']") > items = [] > for titles in titles: > item = JobCrawlItems() > item ["titel"] = titles.select("h2/text()").extract() > item ["strong"] = titles.select("p/strong/text()").extract() > item ["paragraaf"] = titles.select("p/text()").extract() > > return items > > *Command i use in cmd:* > > Scrapy crawl crawlcsv -o data.csv -t csv > > *How can I save the data it finds? what did i do wrong?* > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
