Thanks for your help I just have un problem with the encoding : Syntax-Error : Non-ASCII character '\x80' in file... but no encoding declared; see http://www.python.org/peps/pep-0263.html
How can implement this encoding in scrapy? Regards 2014-01-28 Rolando Espinoza La Fuente <[email protected]> > You can use the euro symbol in your regex. Scrapy under the hood uses the > flag re.UNICODE with allows you to do that. See: > > In [33]: text = u"<span>12,76 EURO</span>" > > In [34]: sel = Selector(text=text) > > In [35]: sel.xpath('//span/text()').re(u'(\d+,\d+) EURO') > Out[35]: [u'12,76'] > > > > On Tue, Jan 28, 2014 at 5:44 PM, d4v1d <[email protected]> wrote: > >> hello >> yes, your are right my explanations are not clear >> my objectif is to find on a web page the price, i supposed that the price >> is construct like this : 12,76 EURO >> i have the different urls in a database, so i test each url and search >> the price with a specific regex but it didn't accept symbol EURO >> Maybe i have to specify that the item['price'] is in utf8 but i don't >> know how ? >> >> def parse(self, response): >> hxs = HtmlXPathSelector(response) >> item = DmozItem() >> item['price'] = hxs.select('//span/text()').re( >> '([0-9]+(?:[,.][0-9])?)\s') >> >> cur = self.db.cursor() >> cur.execute("select url from urls") >> for j in range(len(item['price'])): >> cursor = self.db.cursor() >> sql = "update urls set price_%s = '%s' where url = '%s'" % (j >> , item['price'][j], response.url) >> cursor.execute(sql) >> self.db.commit() >> return item >> >> >> I hope it's more clear >> thanks in advance >> regards >> >> >> Le mardi 28 janvier 2014 12:15:36 UTC+1, Mikołaj Roszkowski a écrit : >>> >>> It's hard to say without seeing the page's source code. The usual method >>> to this task is to crawl the necessery nodes with xpath and then process >>> those scraped items in the item pipeline to extract the values. >>> http://doc.scrapy.org/en/latest/topics/item-pipeline.html >>> >>> >>> 2014-01-28 David LANGLADE <[email protected]> >>> >>>> Hello >>>> Thanks for your feedback >>>> Not really, i want to crawl all the page for find specific symbols + >>>> numeric sequence (for example 15.23 EURO) and return this value >>>> Regards >>>> >>>> >>>> >>>> >>>> 2014-01-27 Mikołaj Roszkowski <[email protected]> >>>> >>>>> You want to check the whole page's html content and then grab values >>>>> with numbers? >>>>> >>>>> >>>>> 2014-01-27 d4v1d <[email protected]> >>>>> >>>>>> is something like this is in the right direction ? >>>>>> >>>>>> item['price'] = hxs.select('/html').re('[0-9]€') >>>>>> >>>>>> >>>>>> >>>>>> Le dimanche 26 janvier 2014 22:35:16 UTC+1, d4v1d a écrit : >>>>>> >>>>>>> Hello >>>>>>> Is it possible to search in an url a specific text without having to >>>>>>> specify a tag >>>>>>> Example, i would like to search all the texts 0 to 9 and with . >>>>>>> before and after the sign $ >>>>>>> It is probably possible with a regex but i don't know how use this >>>>>>> type of tools on scrapy >>>>>>> Thanks for you help >>>>>>> Regards >>>>>>> >>>>>>> >>>>>>> -- >>>>>> You received this message because you are subscribed to the Google >>>>>> Groups "scrapy-users" group. >>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>> send an email to [email protected]. >>>>>> >>>>>> To post to this group, send email to [email protected]. >>>>>> >>>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to a topic in the >>>>> Google Groups "scrapy-users" group. >>>>> To unsubscribe from this topic, visit https://groups.google.com/d/ >>>>> topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe. >>>>> To unsubscribe from this group and all its topics, send an email to >>>>> [email protected]. >>>>> >>>>> To post to this group, send email to [email protected]. >>>>> >>>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>>>> >>>> >>>> >>>> >>>> -- >>>> David LANGLADE >>>> 5 rue du patuel >>>> 42800 Saint martin la plaine >>>> Tel : 06.49.42.38.85 >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "scrapy-users" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> >>>> Visit this group at http://groups.google.com/group/scrapy-users. >>>> For more options, visit https://groups.google.com/groups/opt_out. >>>> >>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> >> To post to this group, send email to [email protected]. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/groups/opt_out. >> > > -- > You received this message because you are subscribed to a topic in the > Google Groups "scrapy-users" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/groups/opt_out. > -- David LANGLADE 5 rue du patuel 42800 Saint martin la plaine Tel : 06.49.42.38.85 -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/groups/opt_out.
