Hi There,

I am a beginner with Python and Scrapy. I tried my luck with scrapy today 
and wrote a small spider to crawl few elements from a website. Below is the 
code:

import re
import scrapy
from scrapy import Selector
from allrecipe.items import AllrecipeItem


class Spider(scrapy.Spider):
    name="crawler"
    allowed_domains = ['allrecipes.co.in']
    start_urls = [
'http://allrecipes.co.in/recipes/searchresults.aspx?text=1%3D1&o_is=Search']
    def parse(self, response):
        titles = response.selector.xpath("/html/body/div/div[3]/div[1]").
extract()
        for i in titles:
            title = response.xpath(
'//*[@id="sectionTopRecipes"]//div/div[2]/h3/a/text()').extract()
            title1="".join(title)
            title2= title1.replace("\r","").replace("\n","").strip()
            print title2
            pt = response.xpath(
'//*[@id="sectionTopRecipes"]//div/div[4]/p[2]/em/a/text()').extract()
            pt1="".join(pt)
            pt2= pt1.replace("\r","").replace("\n","")
            print pt2
        for j in titles:
            url = response.xpath(
'/html/body/div/div[3]/div[1]/div[4]/a[1]/@href').extract()
            url1="".join(url)
            yield scrapy.Request(url1, callback=self.parse)
            

I followed one example from scrapy documentation.

Now the problem i have is: I don't  know how to write this extracted data 
properly in a CSV file. I tried running this command 
scrapy crawl crawler>>data.csv

but by this i am getting messed up output.

Please guide me to write recipe title in one column and author( defined as 
pt) in one correspondingly.

Any help will be highly appreciated.

Regards,
SB




-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to