Re: [Tutor] RE expressions
Steve Willoughby wrote: Johan Nilsson wrote: In [74]: p.findall('asdsa"123abc\123"jggfds') Out[74]: ['"123abcS"'] By the way, you're confusing the use of \ in strings in general with the use of \ in regular expressions and the appearance of \ as a character in data strings encountered by your Python program. When you write the code: p.findall('asdsa"123abc\123"jggfds') the character string 'asdsa"123abc\123"jggfds' contains the special code \123 which means "the ASCII character with the octal value 123". That happens to be the letter S. So that's the same as if you had typed: p.findall('asdsa"123abcS"jggfds') which may explain your results. using a raw string would have solved that problem. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] RE expressions
Johan Nilsson wrote: 'text "http:\123\interesting_adress\etc\etc\" more text' Does this really use backslashes in the text? The standard for URLs (if that's what it is) is to use forward slashes. For your RE, though, you can always use [...] to specify a range including whatever you like. Remember that \ is a special symbol, too. If you want to match a literal \ character, the RE for that is \\. Also remember to use a raw string in Python so the string-building syntax doesn't get confused by the backslashes too. How about something along the lines of: re.compile(r'"[a-zA-Z0-9_\\]*"') but why constrain what may be between the quotes? re.compile(r'"[^"]*"') or even re.compile('".*?"') I have figured out that if it wasn't for the \ a simple p=re.compile('\"\w+\"') would do the trick. From what I understand \w only covers the set [a-zA-Z0-9_] and hence not the "\". I assume the solution is just in front of my eyes, and I have been looking on the screen for too long. Any hints would be appreciated. In [72]: p=re.compile('"\w+\"') In [73]: p.findall('asdsa"123abc123"jggfds') Out[73]: ['"123abc123"'] In [74]: p.findall('asdsa"123abc\123"jggfds') Out[74]: ['"123abcS"'] /Johan ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] RE expressions
Hi all python experts I am trying to work with BeautifulSoup and re and running into one problem. What I want to do is open a webpage and get some information. This is working fine I then want to follow some of links on this page and proces them. I manage to get links that I am interested in filtered out with by simple re expressions. My problem is that I now have a number of string that look like 'text "http:\123\interesting_adress\etc\etc\" more text' I have figured out that if it wasn't for the \ a simple p=re.compile('\"\w+\"') would do the trick. From what I understand \w only covers the set [a-zA-Z0-9_] and hence not the "\". I assume the solution is just in front of my eyes, and I have been looking on the screen for too long. Any hints would be appreciated. In [72]: p=re.compile('"\w+\"') In [73]: p.findall('asdsa"123abc123"jggfds') Out[73]: ['"123abc123"'] In [74]: p.findall('asdsa"123abc\123"jggfds') Out[74]: ['"123abcS"'] /Johan -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor