Re: Output not getting saved to csv

Jan Wrobel Tue, 14 Jan 2014 02:11:01 -0800

Hi Kartik,

The problem may be caused by the lack of close on the CSV file. You
can add close_spider() method to the CSVWriterPipeline class and in
this method close the file descriptor returned by 'open('CONTENT.csv',
'wb')': http://doc.scrapy.org/en/latest/topics/item-pipeline.html#close_spider


It also seems that your parse() logic is invalid. parse() method is
called separately for each url from the start_urls list. The loop 'for
url in start_urls:' does not make much sense (each item created in the
loop will be the same).

Cheers,
Jan


On Sat, Jan 11, 2014 at 11:00 AM, Kartik <[email protected]> wrote:
> I have a spider which reads a list of urls from a text file and saves the
> title and body text from each. The crawl works but the data does not get
> saved to csv. I set up a pipeline to save to csv because the normal -o
> option did not work for me. I did change the settings.py for piepline. Any
> help with this would be greatly appreciated! The code is as follows:
>
> Items.py
>
> from scrapy.item import Item, Field
>
> class PrivacyItem(Item):
>     # define the fields for your item here like:
>     # name = Field()
>     title = Field()
>     desc = Field()
>
> PrivacySpider.py
>
>     from scrapy.contrib.spiders import CrawlSpider, Rule
>     from scrapy.selector import HtmlXPathSelector
>     from privacy.items import PrivacyItem
>
> class PrivacySpider(CrawlSpider):
>     name = "privacy"
>     f = open("urls.txt")
>     start_urls = [url.strip() for url in f.readlines()]
>     f.close()
>
> def parse(self, response):
>     hxs = HtmlXPathSelector(response)
>     items =[]
>     for url in start_urls:
>         item = PrivacyItem()
>         item['desc'] = hxs.select('//body//p/text()').extract()
>         item['title'] = hxs.select('//title/text()').extract()
>         items.append(item)
>
>     return items
>
> Pipelines.py
>
> import csv
>
> class CSVWriterPipeline(object):
>
>     def __init__(self):
>         self.csvwriter = csv.writer(open('CONTENT.csv', 'wb'))
>
>     def process_item(self, item, spider):
>         self.csvwriter.writerow([item['title'][0], item['desc'][0]])
>         return item
>
> --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: Output not getting saved to csv

Reply via email to