Hello
I'm on a scrapy project and I would like to put in the start_urls a list of 
urls present in a database.
I tried that but it doesn't work, it take only the last url in the database 
:-(
please help me, thank you in advance

[CODE]

class Crawl2Spider(BaseSpider):
    name = "crawl2"
    import MySQLdb
    db = MySQLdb.connect(host="localhost", user="root", passwd="", 
db="crawler_engine", charset = 'utf8', use_unicode = False)
    cur = db.cursor()
    cur.execute("select url from urls where num_crawl=1")
    vers = cur.fetchall()
    for i in range(cur.rowcount):
        start_urls = vers[i]
        def parse(self, response):
            hxs = HtmlXPathSelector(response)
            item = DmozItem()
            item['link'] = hxs.select('//div/ul/li/a/@href').extract()
            cursor = self.db.cursor()
            for j in range(len(item['link'])):
                cursor = self.db.cursor()
                sql = "insert into urls(url, domain, num_crawl) values 
('%s','%s','%s')" % (item['link'][j],'test', 1)
                cursor.execute(sql)
                self.db.commit()
            return item

[/CODE]


-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to