[Tutor] Regex Question
Hello, I have a TSV file that has the city,state,country information in this format: Name Display name Code San Jose SJC SJC - SJ (POP), CA (US) San Francisco SFOSFO - SF, CA (US) I need to extract the state and country for each city from this file. I'm trying to do this in python by using the following Regex: s=re.search(',(.*?)\(',text) if s: state=s.group(1).strip() c=re.search('\((.*?)\)',text) if c: country=c.group(1).strip() This works well for the state. But for country for San Jose, it brings the following: country = POP I think it maybe better to search from the end of the string,but I am unable to get the right syntax. Could you please share any pointers? Thanks! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex Question
On 30/9/2013 16:29, Leena Gupta wrote: Hello, I have a TSV file that has the city,state,country information in this format: Name Display name Code San Jose SJC SJC - SJ (POP), CA (US) San Francisco SFOSFO - SF, CA (US) That's not a format, it's a infinitesimally tiny sample. But if we trust in this sample, you don't need a regex at all. The state and country are in the last 7 characters of the string: countr = text[-3:-1] state = text[-7:-5] I could be off by 1 or 2, but you get the idea. if this isn't good enough, then either supply or give a reference to a specification for how code is encoded. (If it does indeed need a regex, someone else will have to help) -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex Question
On 30/09/2013 21:29, Leena Gupta wrote: Hello, I have a TSV file that has the city,state,country information in this format: Name Display name Code San Jose SJC SJC - SJ (POP), CA (US) San Francisco SFOSFO - SF, CA (US) I need to extract the state and country for each city from this file. I'm trying to do this in python by using the following Regex: s=re.search(',(.*?)\(',text) if s: state=s.group(1).strip() c=re.search('\((.*?)\)',text) if c: country=c.group(1).strip() This works well for the state. But for country for San Jose, it brings the following: country = POP I think it maybe better to search from the end of the string,but I am unable to get the right syntax. Could you please share any pointers? Thanks! I'd be strongly inclined to use the CSV module from the standard library with an excel-tab dialect name, see http://docs.python.org/3/library/csv.html#module-csv Please try it and if you encounter any problems feel free to get back to us, we don't bite :) -- Cheers. Mark Lawrence ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Fri, 6 Apr 2012, Khalid Al-Ghamdi wrote: hi all, I'm trying to extract the domain in the following string. Why doesn't my pattern (patt) work: redata 'Tue Jan 14 00:43:21 2020::eax...@gstwyysnbd.gov::1578951801-6-10 Sat Jul 31 15:17:39 1993::rz...@wgxvhx.com::744121059-5-6 Mon Sep 21 20:22:37 1987::ttw...@rpybrct.edu::559243357-6-7 Fri Aug 2 07:15:23 1991::t...@mgfyitsks.net::681106523-4-9 Mon Mar 18 19:59:47 2024::dgz...@fhyykji.org::1710781187-6-7 ' patt=r'\w+\.\w{3}(?=@)' re.findall(patt,redata) [] This pattern works but the first should, too. shouldn't it? The all too familiar quote looks like it applies here: Often programmers, when faced with a problem, think 'Aha! I'll use a regex!'. Now you have two problems. It looks like you could easily split this string with redata.split('::') and then look at every second element in the list and split *that* element on the last '.' in the string. With data as well-formed as this, regex is probably overkill. HTH, Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
I think you can do this: a=[] b=redata.split('::') for e in b: if e.find('@') != -1: a.append(e.split('@')[1]) list a includes all the domain 在 2012年4月9日 上午5:26,Wayne Werner wa...@waynewerner.com写道: On Fri, 6 Apr 2012, Khalid Al-Ghamdi wrote: hi all, I'm trying to extract the domain in the following string. Why doesn't my pattern (patt) work: redata 'Tue Jan 14 00:43:21 2020::eax...@gstwyysnbd.gov::**1578951801-6-10 Sat Jul 31 15:17:39 1993::rz...@wgxvhx.com::**744121059-5-6 Mon Sep 21 20:22:37 1987::ttw...@rpybrct.edu::**559243357-6-7 Fri Aug 2 07:15:23 1991::t...@mgfyitsks.net::**681106523-4-9 Mon Mar 18 19:59:47 2024::dgz...@fhyykji.org::**1710781187-6-7 ' patt=r'\w+\.\w{3}(?=@)' re.findall(patt,redata) [] This pattern works but the first should, too. shouldn't it? The all too familiar quote looks like it applies here: Often programmers, when faced with a problem, think 'Aha! I'll use a regex!'. Now you have two problems. It looks like you could easily split this string with redata.split('::') and then look at every second element in the list and split *that* element on the last '.' in the string. With data as well-formed as this, regex is probably overkill. HTH, Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor -- twitter:@zybest https://twitter.com/#!/zybest 新浪微博:@爱子悦 http://www.weibo.com/zybest 在openshift上搭建wordpress:http://blog-mking.rhcloud.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
Khalid Al-Ghamdi wrote: I'm trying to extract the domain in the following string. Why doesn't my pattern (patt) work: redata 'Tue Jan 14 00:43:21 2020::eax...@gstwyysnbd.gov::1578951801-6-10 Sat Jul 31 15:17:39 1993::rz...@wgxvhx.com::744121059-5-6 Mon Sep 21 20:22:37 1987::ttw...@rpybrct.edu::559243357-6-7 Fri Aug 2 07:15:23 1991::t...@mgfyitsks.net::681106523-4-9 Mon Mar 18 19:59:47 2024::dgz...@fhyykji.org::1710781187-6-7 ' patt=r'\w+\.\w{3}(?=@)' re.findall(patt,redata) [] This pattern works but the first should, too. shouldn't it? No. I think you want r'(?=@)\w+\.\w{3}'. How do you handle a domain like web.de, by the way? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
I continue working with RegExp, but I have reached a point for wich I can't find documentation, maybe there is no possible way to do it, any way I throw the question: This is my code: contents = re.sub(r'Á', A, contents) contents = re.sub(r'á', a, contents) contents = re.sub(r'É', E, contents) contents = re.sub(r'é', e, contents) contents = re.sub(r'Í', I, contents) contents = re.sub(r'í', i, contents) contents = re.sub(r'Ó', O, contents) contents = re.sub(r'ó', o, contents) contents = re.sub(r'Ú', U, contents) contents = re.sub(r'ú', u, contents) It is clear that I need to convert any accented vowel into the same not accented vowel, The qestion is : is there a way to say that whenever you find an accented character this one has to change into a non accented character, but not every character, it must be only this vowels and accented this way, because at the language I am working with, there are letters like ü, and ñ that should remain the same. thanks you all. ___ andrés chandía P No imprima innecesariamente. ¡Cuide el medio ambiente! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
2011/4/3 Andrés Chandía and...@chandia.net: I continue working with RegExp, but I have reached a point for wich I can't find documentation, maybe there is no possible way to do it, any way I throw the question: This is my code: contents = re.sub(r'Á', A, contents) contents = re.sub(r'á', a, contents) contents = re.sub(r'É', E, contents) contents = re.sub(r'é', e, contents) contents = re.sub(r'Í', I, contents) contents = re.sub(r'í', i, contents) contents = re.sub(r'Ó', O, contents) contents = re.sub(r'ó', o, contents) contents = re.sub(r'Ú', U, contents) contents = re.sub(r'ú', u, contents) It is clear that I need to convert any accented vowel into the same not accented vowel, The qestion is : is there a way to say that whenever you find an accented character this one has to change into a non accented character, but not every character, it must be only this vowels and accented this way, because at the language I am working with, there are letters like ü, and ñ that should remain the same. Okay, first thing, forget about regexes for this problem.They're too complicated and not suited to it. Encoding issues make this a somewhat complicated problem. In Unicode, There's two ways to encode most accented characters. For example, the character Ć can be encoded both by U+0106, LATIN CAPITAL LETTER C WITH ACUTE, and a combination of U+0043 and U+0301, being simply 'C' and the 'COMBINING ACUTE ACCENT', respectively. You must remove both forms to be sure every accented character is gone from your string. using unicode.translate, you can craft a translation table to translate the accented characters to their non-accented counterparts. The combining characters can simply be removed by mapping them to None. HTH, Hugo ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
Hugo Arts wrote: 2011/4/3 Andrés Chandía and...@chandia.net: I continue working with RegExp, but I have reached a point for wich I can't find documentation, maybe there is no possible way to do it, any way I throw the question: This is my code: contents = re.sub(r'Á', A, contents) contents = re.sub(r'á', a, contents) contents = re.sub(r'É', E, contents) contents = re.sub(r'é', e, contents) contents = re.sub(r'Í', I, contents) contents = re.sub(r'í', i, contents) contents = re.sub(r'Ó', O, contents) contents = re.sub(r'ó', o, contents) contents = re.sub(r'Ú', U, contents) contents = re.sub(r'ú', u, contents) It is clear that I need to convert any accented vowel into the same not accented vowel, The qestion is : is there a way to say that whenever you find an accented character this one has to change into a non accented character, but not every character, it must be only this vowels and accented this way, because at the language I am working with, there are letters like ü, and ñ that should remain the same. Okay, first thing, forget about regexes for this problem.They're too complicated and not suited to it. Encoding issues make this a somewhat complicated problem. In Unicode, There's two ways to encode most accented characters. For example, the character Ć can be encoded both by U+0106, LATIN CAPITAL LETTER C WITH ACUTE, and a combination of U+0043 and U+0301, being simply 'C' and the 'COMBINING ACUTE ACCENT', respectively. You must remove both forms to be sure every accented character is gone from your string. using unicode.translate, you can craft a translation table to translate the accented characters to their non-accented counterparts. The combining characters can simply be removed by mapping them to None. If you go that road you might be interested in Fredrik Lundh's article at http://effbot.org/zone/unicode-convert.htm The class presented there is a bit tricky, but for your purpose it might be sufficient to subclass it: KEEP_CHARS = set(ord(c) for c in uüñ) class Map(unaccented_map): ... def __missing__(self, key): ... if key in KEEP_CHARS: ... self[key] = key ... return key ... return unaccented_map.__missing__(self, key) ... print uäöü.translate(Map()) aoü ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Regex question
Andrés Chandía and...@chandia.net wrote I'm new to this list, so hello everybody!. Hi, welcome to the list. Please do not use reply to start a new thread it confuses threaded readers and may mean you message will not be seen. Also please supply a meaningful subject (as above) so we can decide if it looks like something we can answer! These will help you maximise the replies. Also, although not relevant here, please include the full text of any error messages and the Python version and OS you are using (2 or 3 etc). Basically anything that helps us understand the context. in perl there is a way to reference previous registers, $text =~ s/u(l|L|n|N)\/u/$1e/g; I'm looking for the way to do it in python It is possible but I'll let some of the more regex literate users tell you how :-) -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
On 29-Mar-11 23:55, Alan Gauld wrote: Andrés Chandía and...@chandia.net wrote in perl there is a way to reference previous registers, $text =~ s/u(l|L|n|N)\/u/$1e/g; I'm looking for the way to do it in python If you're using just a straight call to re.sub(), it works like this: text = re.sub(r'u(l|L|n|N)/u', '\1e', text) You use \1, \2, etc. for backreferences just like all the other regex-based editors do (Perl's more of an exception than the rule there). Alternatively, you can pre-compile the regular expression into an object: pattern = re.compile(r'u(l|L|n|N)/u') and then substitute by calling its sub() method: text = pattern.sub('\1e', text) -- Steve Willoughby / st...@alchemy.com A ship in harbor is safe, but that is not what ships are built for. PGP Fingerprint 48A3 2621 E72C 31D9 2928 2E8F 6506 DB29 54F7 0F53 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
Thanks Kushal and Steve. I think it works,a I say I think because at the results I got a strange character instead of the letter that should appear this is my regexp: contents = re.sub(r'(u|span style=text-decoration: underline;)(l|L|n|N|t|T)(/span|/u)', '\2\'' ,contents) this is my input file content: ul/uomo un/uomo ut/uomo uL/uomo uN/uomo uT/uomo span style=text-decoration: underline;n/spanomo ut/uomo this is my output file content 'omo 'omo 'omo 'omo 'omo 'omo 'omo 'omo at to head of the file I got: #!/usr/bin/env python # -*- coding: utf-8 -*- I tried changing the coding to iso-8859-15, but nothing, for sure you know the reason for this, can you share it with this poor newbee Thanks a lot!! On Wed, March 30, 2011 09:46, Kushal Kumaran wrote: 2011/3/30 Andrés ChandÃa and...@chandia.net: I'm new to this list, so hello everybody!. Hello Andrés The stuff: I'm working with regexps and this is my line: contents = re.sub(ul\/u, le ,contents) in perl there is a way to reference previous registers, i.e. $text =~ s/u(l|L|n|N)\/u/$1e/g; So I'm looking for the way to do it in python, obviously this does not works: contents = re.sub(u(l|L|n|N)\/u, $1e, contents) You will use \1 for the backreference. The documentation of the re module (http://docs.python.org/library/re.html#re.sub) has an example. Also note the use of raw strings (r'...') to avoid having to escape the backslash with another backslash. ___ andrés chandía P No imprima innecesariamente. ¡Cuide el medio ambiente! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
On 30-Mar-11 08:21, Andrés Chandía wrote: Thanks Kushal and Steve. I think it works,a I say I think because at the results I got a strange character instead of the letter that should appear this is my regexp: contents = re.sub(r'(u|span style=text-decoration: underline;)(l|L|n|N|t|T)(/span|/u)', '\2\'' ,contents) Remember that \2 in a string means the ASCII character with the code 002. You need to escape this with an extra backslash: '\\2\'' Although it would be more convenient to switch to double quotes to make the inclusion of the literal single quote easier: \\2' How does that work? As the string is being built, the \\ is interpreted as a literal backslash, so the actual characters in the string's value end up being: \2' THAT is what is then passed into the sub() function, where \2 means to replace the second match. This can be yet simpler by using raw strings: r\2' Since in raw strings, backslashes do almost nothing special at all, so you don't need to double them. I should have thought of that when sending my original answer to your question. Sorry I overlooked it. --steve -- Steve Willoughby / st...@alchemy.com A ship in harbor is safe, but that is not what ships are built for. PGP Fingerprint 48A3 2621 E72C 31D9 2928 2E8F 6506 DB29 54F7 0F53 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regex question
Thanks Steve, your are, from now on, my guru this is the final version, the good one! contents = re.sub(r'(u|span style=text-decoration: underline;)(l|L|n|N|t|T)(/span|/u)', r\2' ,contents) On Wed, March 30, 2011 17:27, Steve Willoughby wrote: On 30-Mar-11 08:21, Andrés Chandía wrote: Thanks Kushal and Steve. I think it works,a I say I think because at the results I got a strange character instead of the letter that should appear this is my regexp: contents = re.sub(r'(u|span style=text-decoration: underline;)(l|L|n|N|t|T)(/span|/u)', '\2\'' ,contents) Remember that \2 in a string means the ASCII character with the code 002. You need to escape this with an extra backslash: '\\2\'' Although it would be more convenient to switch to double quotes to make the inclusion of the literal single quote easier: \\2' How does that work? As the string is being built, the \\ is interpreted as a literal backslash, so the actual characters in the string's value end up being: \2' THAT is what is then passed into the sub() function, where \2 means to replace the second match. This can be yet simpler by using raw strings: r\2' Since in raw strings, backslashes do almost nothing special at all, so you don't need to double them. I should have thought of that when sending my original answer to your question. Sorry I overlooked it. --steve ___ andrés chandía P No imprima innecesariamente. ¡Cuide el medio ambiente! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] regex question
I use regex = .* + search + .* p = re.compile(regex, re.I) in finding lines in a text file that contain search, a string entered at a prompt. What regex do I use to find lines in a text file that contain search, where search is a word entered at a prompt? Thanks, Dick Moores ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 9:37 AM, Richard D. Moores rdmoo...@gmail.comwrote: I use regex = .* + search + .* p = re.compile(regex, re.I) in finding lines in a text file that contain search, a string entered at a prompt. What regex do I use to find lines in a text file that contain search, where search is a word entered at a prompt? Thanks, Dick Moores You could use (2.6+ I think): word = raw_input('Enter word to search for: ') with open('somefile.txt') as f: for line in f: if word in line: print line You could always try a speed test, but I'm guessing that other than extremely large files (10k+ lines) you probably won't see much speed difference. Then again, you might! HTH, Wayne p.s. I tend to only use a regex when I absolutely need to, because usually when you try to solve one problem with a regex it becomes two problems. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 07:55, Wayne Werner waynejwer...@gmail.com wrote: On Tue, Jan 4, 2011 at 9:37 AM, Richard D. Moores rdmoo...@gmail.com You could use (2.6+ I think): word = raw_input('Enter word to search for: ') with open('somefile.txt') as f: for line in f: if word in line: print line I think I do need a regex for cases such as this: A file has these 2 lines: alksdhjf ksjhdf kjshf dex akjdhf jkdshf jsdhf alkdshf jkashd flkjdsf index alkdjshf alkdjshf And I want the only line that contains the word dex Dick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 10:37 AM, Richard D. Moores rdmoo...@gmail.com wrote: regex = .* + search + .* p = re.compile(regex, re.I) in finding lines in a text file that contain search, a string entered at a prompt. That's an inefficient regex (though the compiler may be smart enough to prune the unneeded .*). Just having search as your regex is fine (it will search for the pattern _in_ the string, no need to specify the other parts of the string), but if you're not using any special regex characters you're probably better off not using a regex and just using a string operation. Regexes are great for trying to do powerful and complicated things - and as such may be too complicated if you're trying to do a simple thing. -- Brett Ritter / SwiftOne swift...@swiftone.org ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 09:31, Brett Ritter swift...@swiftone.org wrote: On Tue, Jan 4, 2011 at 10:37 AM, Richard D. Moores rdmoo...@gmail.com wrote: regex = .* + search + .* p = re.compile(regex, re.I) Just having search as your regex is fine (it will search for the pattern _in_ the string, no need to specify the other parts of the string), I see. Thanks. but if you're not using any special regex characters you're probably better off not using a regex and just using a string operation. Please see my reply to Wayne Werner. Dick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 10:41, Richard D. Moores rdmoo...@gmail.com wrote: Please see http://tutoree7.pastebin.com/z9YeSYRw . I'm actually searching RTF files, not TXT files. I want to modify this script to handle searching on a word. So what, for example, should line 71 be? OK, I think I've got it. in place of lines 66-75 I now have search = input(first search string: ) search = \\b + search + \\b if not search: print(Bye) sys.exit() elif search[0] != ' ': p = re.compile(search, re.I) else: p = re.compile(search) Dick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 11:57, Richard D. Moores rdmoo...@gmail.com wrote: On Tue, Jan 4, 2011 at 10:41, Richard D. Moores rdmoo...@gmail.com wrote: Please see http://tutoree7.pastebin.com/z9YeSYRw . I'm actually searching RTF files, not TXT files. I want to modify this script to handle searching on a word. So what, for example, should line 71 be? OK, I think I've got it. in place of lines 66-75 I now have search = input(first search string: ) search = \\b + search + \\b if not search: print(Bye) sys.exit() elif search[0] != ' ': p = re.compile(search, re.I) else: p = re.compile(search) Oops. That should be search = input(first search string: ) if not search: print(Bye) sys.exit() elif search[0] != ' ': search = \\b + search + \\b p = re.compile(search, re.I) else: search = \\b + search + \\b p = re.compile(search) Dick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On 01/-10/-28163 02:59 PM, Richard D. Moores wrote: On Tue, Jan 4, 2011 at 11:57, Richard D. Mooresrdmoo...@gmail.com wrote: On Tue, Jan 4, 2011 at 10:41, Richard D. Mooresrdmoo...@gmail.com wrote: Please see http://tutoree7.pastebin.com/z9YeSYRw . I'm actually searching RTF files, not TXT files. I want to modify this script to handle searching on a word. So what, for example, should line 71 be? OK, I think I've got it. in place of lines 66-75 I now have search =nput(first search string: ) search =\\b + search + \\b if not search: print(Bye) sys.exit() elif search[0] != ': p =e.compile(search, re.I) else: p =e.compile(search) Oops. That should be search =nput(first search string: ) if not search: print(Bye) sys.exit() elif search[0] != ': search =\\b + search + \\b p =e.compile(search, re.I) else: search =\\b + search + \\b p =e.compile(search) Dick One hazard is if the string the user inputs has any regex special characters in it. If it's anything but letters and digits you probably want to escape it before combining it with your \\b strings. DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
Dave Angel wrote: One hazard is if the string the user inputs has any regex special characters in it. If it's anything but letters and digits you probably want to escape it before combining it with your \\b strings. It is best to escape any user-input before passing it to regex regardless. The re.escape function will do the right thing whether the string is all letters and digits or not. re.escape(dev) 'dev' re.escape(dev+) 'dev\\+' -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] regex question
On Tue, Jan 4, 2011 at 14:58, Steven D'Aprano st...@pearwood.info wrote: Dave Angel wrote: One hazard is if the string the user inputs has any regex special characters in it. If it's anything but letters and digits you probably want to escape it before combining it with your \\b strings. It is best to escape any user-input before passing it to regex regardless. The re.escape function will do the right thing whether the string is all letters and digits or not. re.escape(dev) 'dev' re.escape(dev+) 'dev\\+' I didn't know about re.escape. from the 3.1.3 docs: re.escape(string) Return string with all non-alphanumerics backslashed; this is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it. I'm writing the script for my own use, and don't expect to be searching on non-alphanumerics. Even so, I'd like to incorporate re.escape. However, I'm using ' ' to set case sensitive searches, and '=' to set word searches. Would you take a look at my revised script at http://tutoree7.pastebin.com/wQHVV68U, lines 72-97? I tried using line 80, but I can't because '=' is a regular expression metacharacter. I could use some other character instead of '=', but I would want it to be one that can be typed easily without using the shift key. '=' is the best, I think. I did try to use 'qq' instead of '=', but that got messy. Or is there another, completely different way to do what I do in lines 72-97 with ' ' and '=' that wouldn't involve increasing the number of prompts? Right now, the user has to respond to 4 prompts, even though some responses are quickly made: either by entering nothing, or by entering anything. Dick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor