Hello,
I am trying to scrape a website for practice and basically what I am trying
to accomplish is to pull all the companies that are active and download
them to a CSV file. You can see my code pasted below. I am not sure what I
am doing wrong.
Also I think the spider is crawling the website multiple times based on its
output. I only want it to crawl the site once every time I run it.
from scrapy.spider import Spider
from scrapy.selector import Selector
from bizzy.items import BizzyItem
class SunSpider(Spider):
name = "Sun"
allowed_domains = ['sunbiz.org']
start_urls = [
'http://search.sunbiz.org/Inquiry/CorporationSearch/SearchResults/EntityName/a/Page1'
]
def parse(self, response):
sel = Selector(response)
sites = sel.xpath('//tbody/tr')
items = []
for site in sites:
item = BizzyItem()
item["company"] = sel.xpath('//td[1]/a/text()').extract()
item["status"] = sel.xpath('//td[3]/text()').extract()
if item["status"] != 'Active':
pass
else:
items.append(item)
return items
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.