ok it works perfectly thanks
2014-01-29 Paul Tremberth <[email protected]> > You could declare the encoding of the Python script containing this "EURO" > character, > with for example > #!/usr/bin/env python > # -*- coding: utf-8 -*- > > at the top (adapt to the encoding used by your code editor) > > or safer, but less readable, is the use the unicode Python representation > or the "EURO" character > >>> text = u"<span>12,76 EURO</span>" > >>> [text] > [u'<span>12,76 \u20ac</span>'] > > so the regex becomes > sel.xpath('//span/text()').re(u'(\d+,\d+) \u20ac') > > /Paul. > > > On Wednesday, January 29, 2014 12:26:21 PM UTC+1, d4v1d wrote: > >> Thanks for your help >> I just have un problem with the encoding : >> >> Syntax-Error : Non-ASCII character '\x80' in file... >> but no encoding declared; see http://www.python.org/peps/pep-0263.html >> >> How can implement this encoding in scrapy? >> Regards >> >> >> >> 2014-01-28 Rolando Espinoza La Fuente <[email protected]> >> >>> You can use the euro symbol in your regex. Scrapy under the hood uses >>> the flag re.UNICODE with allows you to do that. See: >>> >>> In [33]: text = u"<span>12,76 EURO</span>" >>> >>> In [34]: sel = Selector(text=text) >>> >>> In [35]: sel.xpath('//span/text()').re(u'(\d+,\d+) EURO') >>> Out[35]: [u'12,76'] >>> >>> >>> >>> On Tue, Jan 28, 2014 at 5:44 PM, d4v1d <[email protected]> wrote: >>> >>>> hello >>>> yes, your are right my explanations are not clear >>>> my objectif is to find on a web page the price, i supposed that the >>>> price is construct like this : 12,76 EURO >>>> i have the different urls in a database, so i test each url and search >>>> the price with a specific regex but it didn't accept symbol EURO >>>> Maybe i have to specify that the item['price'] is in utf8 but i don't >>>> know how ? >>>> >>>> def parse(self, response): >>>> hxs = HtmlXPathSelector(response) >>>> item = DmozItem() >>>> item['price'] = hxs.select('//span/text()').re( >>>> '([0-9]+(?:[,.][0-9])?)\s') >>>> >>>> cur = self.db.cursor() >>>> cur.execute("select url from urls") >>>> for j in range(len(item['price'])): >>>> cursor = self.db.cursor() >>>> sql = "update urls set price_%s = '%s' where url = '%s'" % >>>> (j, item['price'][j], response.url) >>>> cursor.execute(sql) >>>> self.db.commit() >>>> return item >>>> >>>> >>>> I hope it's more clear >>>> thanks in advance >>>> regards >>>> >>>> >>>> Le mardi 28 janvier 2014 12:15:36 UTC+1, Mikołaj Roszkowski a écrit : >>>>> >>>>> It's hard to say without seeing the page's source code. The usual >>>>> method to this task is to crawl the necessery nodes with xpath and then >>>>> process those scraped items in the item pipeline to extract the values. >>>>> http://doc.scrapy.org/en/latest/topics/item-pipeline.html >>>>> >>>>> >>>>> 2014-01-28 David LANGLADE <[email protected]> >>>>> >>>>>> Hello >>>>>> Thanks for your feedback >>>>>> Not really, i want to crawl all the page for find specific symbols + >>>>>> numeric sequence (for example 15.23 EURO) and return this value >>>>>> Regards >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 2014-01-27 Mikołaj Roszkowski <[email protected]> >>>>>> >>>>>>> You want to check the whole page's html content and then grab values >>>>>>> with numbers? >>>>>>> >>>>>>> >>>>>>> 2014-01-27 d4v1d <[email protected]> >>>>>>> >>>>>>>> is something like this is in the right direction ? >>>>>>>> >>>>>>>> item['price'] = hxs.select('/html').re('[0-9]€') >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Le dimanche 26 janvier 2014 22:35:16 UTC+1, d4v1d a écrit : >>>>>>>> >>>>>>>>> Hello >>>>>>>>> Is it possible to search in an url a specific text without having >>>>>>>>> to specify a tag >>>>>>>>> Example, i would like to search all the texts 0 to 9 and with . >>>>>>>>> before and after the sign $ >>>>>>>>> It is probably possible with a regex but i don't know how use this >>>>>>>>> type of tools on scrapy >>>>>>>>> Thanks for you help >>>>>>>>> Regards >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>> You received this message because you are subscribed to the Google >>>>>>>> Groups "scrapy-users" group. >>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>> send an email to [email protected]. >>>>>>>> >>>>>>>> To post to this group, send email to [email protected]. >>>>>>>> >>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to a topic in >>>>>>> the Google Groups "scrapy-users" group. >>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to >>>>>>> pic/scrapy-users/Q5YJPx3vEiQ/unsubscribe. >>>>>>> To unsubscribe from this group and all its topics, send an email to >>>>>>> [email protected]. >>>>>>> >>>>>>> To post to this group, send email to [email protected]. >>>>>>> >>>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> David LANGLADE >>>>>> 5 rue du patuel >>>>>> 42800 Saint martin la plaine >>>>>> Tel : 06.49.42.38.85 >>>>>> >>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "scrapy-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> To post to this group, send email to [email protected]. >>>>>> >>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "scrapy-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> >>>> To post to this group, send email to [email protected]. >>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- >>> You received this message because you are subscribed to a topic in the >>> Google Groups "scrapy-users" group. >>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>> topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe. >>> To unsubscribe from this group and all its topics, send an email to >>> [email protected]. >>> To post to this group, send email to [email protected]. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/groups/opt_out. >>> >> >> >> >> -- >> David LANGLADE >> 5 rue du patuel >> 42800 Saint martin la plaine >> Tel : 06.49.42.38.85 >> > -- > You received this message because you are subscribed to a topic in the > Google Groups "scrapy-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- David LANGLADE 5 rue du patuel 42800 Saint martin la plaine Tel : 06.49.42.38.85 -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
