"Ken D'Ambrosio" <k...@jots.org> writes: > Hi, all. As a recovering Perl guy, I have to admit I don't quite "get" > the re module. For example, I'd like to do a few things (I'm going to use > phone numbers, 'cause that's what I'm currently dealing with): > 12345678900 -- How would I: > - Get just the area code? > - Get just the seven-digit number? > > In Perl, I'd so something like > m/^1(...)(.......)/;
Wouldn't that be better as: m/^1(\d{3})(\d{7})$/; I'll assume that more-precise expression in what follows. > and then I'd have the numbers in $1 and $2, respectively. But the Python > stuff simply isn't clicking for me. In general, where a set of data is likely to be iterated, the Pythonic way to present it is via a single iterable (instead of, in your Perl example, separate variables). Then, for those (generally less frequent) cases where you do want the separate items, you can bind them in a single statement: (foo, bar, baz) = some_sequence or (foo, bar, baz) = (item for item in some_sequence) e.g.: >>> (foo, bar, baz) = [1, 2, 3] >>> foo 1 >>> bar 2 >>> baz 3 So, the match returned by the various ‘re’ module match functions is an object which allows access to the grouped matches as a sequence. > If anyone could supply concrete examples of how to do the problem, > above, that would be terrific. Assuming the following: >>> import re >>> phone_number_regex = '^1(\d{3})(\d{7})$' Trivial one-shot example: >>> phone_number = '12345678900' >>> (area_code, local_number) = re.match(phone_number_regex, phone_number).groups() >>> area_code '234' >>> local_number '5678900' More explicit example, showing the various steps and assuming you want to re-use the various values in multiple statements: >>> phone_number_pattern = re.compile(phone_number_regex) >>> phone_number_pattern <_sre.SRE_Pattern object at 0xf7f8c598> >>> phone_number = '12345678900' >>> phone_number_match = phone_number_pattern.match(phone_number) >>> phone_number_match <_sre.SRE_Match object at 0xf7f52338> >>> (area_code, local_number) = phone_number_match.groups() >>> area_code '234' >>> local_number '5678900' Python regular expressions also allow naming each group, for later access to the matches via a dict: >>> phone_number_regex = '^1(?P<area_code>\d{3})(?P<local_number>\d{7})' >>> phone_number_pattern = re.compile(phone_number_regex) >>> phone_number_match = phone_number_pattern.match(phone_number) >>> phone_number_groups = phone_number_match.groupdict() >>> phone_number_groups['area_code'] '234' >>> phone_number_groups['local_number'] '5678900' -- \ “… one of the main causes of the fall of the Roman Empire was | `\ that, lacking zero, they had no way to indicate successful | _o__) termination of their C programs.” —Robert Firth | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list