André Roberge wrote: > Sorry for the simple question, but I find regular > expressions rather intimidating. And I've never > needed them before ... > > How would I go about to 'define' a regular expression that > would identify strings like > __alphanumerical__ as in __init__ > (Just to spell things out, as I have seen underscores disappear > from messages before, that's 2 underscores immediately > followed by an alphanumerical string immediately followed > by 2 underscore; in other words, a python 'private' method). > > Simple one-liner would be good. > One-liner with explanation would be better. > > One-liner with explanation, and pointer to 'great tutorial' > (for future reference) would probably be ideal. > (I know, google is my friend for that last part. :-) > > Andre
Firstly, some corrections: (1) google is your friend for _all_ parts of your question (2) Python has an initial P and doesn't have private methods. Read this: >>> pat1 = r'__[A-Za-z0-9_]*__' >>> pat2 = r'__\w*__' >>> import re >>> tests = ['x', '__', '____', '_____', '__!__', '__a__', '__Z__', '__8__', '__xyzzy__', '__plugh'] >>> [x for x in tests if re.search(pat1, x)] ['____', '_____', '__a__', '__Z__', '__8__', '__xyzzy__'] >>> [x for x in tests if re.search(pat2, x)] ['____', '_____', '__a__', '__Z__', '__8__', '__xyzzy__'] >>> I've interpreted your question as meaning "valid Python identifier that starts and ends with two [implicitly, or more] underscores". In the two alternative patterns, the part in the middle says "zero or more instances of a character that can appear in the middle of a Python identifier". The first pattern spells this out as "capital letters, small letters, digits, and underscore". The second pattern uses the \w shorthand to give the same effect. You should be able to follow that from the Python documentation. Now, read this: http://www.amk.ca/python/howto/regex/ HTH, John -- http://mail.python.org/mailman/listinfo/python-list