Hello,

I am trying to scrape a website for practice and basically what I am trying 
to accomplish is to pull all the companies that are active and download 
them to a CSV file. You can see my code pasted below. I am not sure what I 
am doing wrong. 

Also I think the spider is crawling the website multiple times based on its 
output. I only want it to crawl the site once every time I run it.

from scrapy.spider import Spider
from scrapy.selector import Selector
from bizzy.items import BizzyItem

class SunSpider(Spider):
    name = "Sun"
    allowed_domains = ['sunbiz.org']
    start_urls = [
        
'http://search.sunbiz.org/Inquiry/CorporationSearch/SearchResults/EntityName/a/Page1'
    ]


    def parse(self, response):
        sel = Selector(response)
        sites = sel.xpath('//tbody/tr')
        items = []
        for site in sites:
            item = BizzyItem()
            item["company"] = sel.xpath('//td[1]/a/text()').extract()
            item["status"] = sel.xpath('//td[3]/text()').extract()
            if item["status"] != 'Active':
                pass
            else:
                items.append(item)
        return items

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to