Re: xpath and specific sign

David LANGLADE Wed, 29 Jan 2014 13:17:10 -0800

ok it works perfectly thanks


2014-01-29 Paul Tremberth <[email protected]>

> You could declare the encoding of the Python script containing this "EURO"
> character,
> with for example
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> at the top (adapt to the encoding used by your code editor)
>
> or safer, but less readable, is the use the unicode Python representation
> or the "EURO" character
> >>> text = u"<span>12,76 EURO</span>"
> >>> [text]
> [u'<span>12,76 \u20ac</span>']
>
> so the regex becomes
> sel.xpath('//span/text()').re(u'(\d+,\d+) \u20ac')
>
> /Paul.
>
>
> On Wednesday, January 29, 2014 12:26:21 PM UTC+1, d4v1d wrote:
>
>> Thanks for your help
>> I just have un problem with the encoding :
>>
>> Syntax-Error : Non-ASCII character '\x80' in file...
>> but no encoding declared; see http://www.python.org/peps/pep-0263.html
>>
>> How can implement this encoding in scrapy?
>> Regards
>>
>>
>>
>> 2014-01-28 Rolando Espinoza La Fuente <[email protected]>
>>
>>> You can use the euro symbol in your regex. Scrapy under the hood uses
>>> the flag re.UNICODE with allows you to do that. See:
>>>
>>> In [33]: text = u"<span>12,76 EURO</span>"
>>>
>>> In [34]: sel = Selector(text=text)
>>>
>>> In [35]: sel.xpath('//span/text()').re(u'(\d+,\d+) EURO')
>>> Out[35]: [u'12,76']
>>>
>>>
>>>
>>> On Tue, Jan 28, 2014 at 5:44 PM, d4v1d <[email protected]> wrote:
>>>
>>>> hello
>>>> yes, your are right my explanations are not clear
>>>> my objectif is to find on a web page the price, i supposed that the
>>>> price is construct like this : 12,76 EURO
>>>> i have the different urls in a database, so i test each url and search
>>>> the price with a specific regex but it didn't accept symbol EURO
>>>> Maybe i have to specify that the item['price'] is in utf8 but i don't
>>>> know how ?
>>>>
>>>>     def parse(self, response):
>>>>         hxs = HtmlXPathSelector(response)
>>>>         item = DmozItem()
>>>>         item['price'] = hxs.select('//span/text()').re(
>>>> '([0-9]+(?:[,.][0-9])?)\s')
>>>>
>>>>         cur = self.db.cursor()
>>>>         cur.execute("select url from urls")
>>>>         for j in range(len(item['price'])):
>>>>             cursor = self.db.cursor()
>>>>             sql = "update urls set price_%s = '%s' where url = '%s'" %
>>>> (j, item['price'][j], response.url)
>>>>             cursor.execute(sql)
>>>>             self.db.commit()
>>>>         return item
>>>>
>>>>
>>>> I hope it's more clear
>>>> thanks in advance
>>>> regards
>>>>
>>>>
>>>> Le mardi 28 janvier 2014 12:15:36 UTC+1, Mikołaj Roszkowski a écrit :
>>>>>
>>>>> It's hard to say without seeing the page's source code. The usual
>>>>> method to this task is to crawl the necessery nodes with xpath and then
>>>>> process those scraped items in the item pipeline to extract the values.
>>>>> http://doc.scrapy.org/en/latest/topics/item-pipeline.html
>>>>>
>>>>>
>>>>> 2014-01-28 David LANGLADE <[email protected]>
>>>>>
>>>>>>  Hello
>>>>>> Thanks for your feedback
>>>>>> Not really, i want to crawl all the page for find specific symbols +
>>>>>> numeric sequence (for example 15.23 EURO) and return this value
>>>>>> Regards
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2014-01-27 Mikołaj Roszkowski <[email protected]>
>>>>>>
>>>>>>> You want to check the whole page's html content and then grab values
>>>>>>> with numbers?
>>>>>>>
>>>>>>>
>>>>>>> 2014-01-27 d4v1d <[email protected]>
>>>>>>>
>>>>>>>> is something like this is in the right direction ?
>>>>>>>>
>>>>>>>> item['price'] = hxs.select('/html').re('[0-9]&#128;')
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Le dimanche 26 janvier 2014 22:35:16 UTC+1, d4v1d a écrit :
>>>>>>>>
>>>>>>>>> Hello
>>>>>>>>> Is it possible to search in an url a specific text without having
>>>>>>>>> to specify a tag
>>>>>>>>> Example, i would like to search all the texts 0 to 9 and with .
>>>>>>>>> before and after the sign $
>>>>>>>>> It is probably possible with a regex but i don't know how use this
>>>>>>>>> type of tools on scrapy
>>>>>>>>> Thanks for you help
>>>>>>>>> Regards
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>  --
>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>> Groups "scrapy-users" group.
>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>> send an email to [email protected].
>>>>>>>>
>>>>>>>> To post to this group, send email to [email protected].
>>>>>>>>
>>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>>
>>>>>>>
>>>>>>>  --
>>>>>>> You received this message because you are subscribed to a topic in
>>>>>>> the Google Groups "scrapy-users" group.
>>>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>>>> pic/scrapy-users/Q5YJPx3vEiQ/unsubscribe.
>>>>>>>  To unsubscribe from this group and all its topics, send an email to
>>>>>>> [email protected].
>>>>>>>
>>>>>>> To post to this group, send email to [email protected].
>>>>>>>
>>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> David LANGLADE
>>>>>> 5 rue du patuel
>>>>>> 42800 Saint martin la plaine
>>>>>> Tel : 06.49.42.38.85
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "scrapy-users" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> To post to this group, send email to [email protected].
>>>>>>
>>>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>
>>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "scrapy-users" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>> To post to this group, send email to [email protected].
>>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "scrapy-users" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/
>>> topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>
>>
>>
>> --
>> David LANGLADE
>> 5 rue du patuel
>> 42800 Saint martin la plaine
>> Tel : 06.49.42.38.85
>>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "scrapy-users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/scrapy-users/Q5YJPx3vEiQ/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/groups/opt_out.
>



-- 
David LANGLADE
5 rue du patuel
42800 Saint martin la plaine
Tel : 06.49.42.38.85

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/groups/opt_out.

Re: xpath and specific sign

Reply via email to