Hi There,
I am a beginner with Python and Scrapy. I tried my luck with scrapy today
and wrote a small spider to crawl few elements from a website. Below is the
code:
import re
import scrapy
from scrapy import Selector
from allrecipe.items import AllrecipeItem
class Spider(scrapy.Spider):
name="crawler"
allowed_domains = ['allrecipes.co.in']
start_urls = [
'http://allrecipes.co.in/recipes/searchresults.aspx?text=1%3D1&o_is=Search']
def parse(self, response):
titles = response.selector.xpath("/html/body/div/div[3]/div[1]").
extract()
for i in titles:
title = response.xpath(
'//*[@id="sectionTopRecipes"]//div/div[2]/h3/a/text()').extract()
title1="".join(title)
title2= title1.replace("\r","").replace("\n","").strip()
print title2
pt = response.xpath(
'//*[@id="sectionTopRecipes"]//div/div[4]/p[2]/em/a/text()').extract()
pt1="".join(pt)
pt2= pt1.replace("\r","").replace("\n","")
print pt2
for j in titles:
url = response.xpath(
'/html/body/div/div[3]/div[1]/div[4]/a[1]/@href').extract()
url1="".join(url)
yield scrapy.Request(url1, callback=self.parse)
I followed one example from scrapy documentation.
Now the problem i have is: I don't know how to write this extracted data
properly in a CSV file. I tried running this command
scrapy crawl crawler>>data.csv
but by this i am getting messed up output.
Please guide me to write recipe title in one column and author( defined as
pt) in one correspondingly.
Any help will be highly appreciated.
Regards,
SB
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.