Re: Parsing for email addresses

Tim Chase Mon, 15 Feb 2010 16:40:55 -0800

Jonathan Gardner wrote:

On Feb 15, 3:34 pm, galileo228 <[email protected]> wrote:

I'm trying to write python code that will open a textfile and find the
email addresses inside it. I then want the code to take just the
characters to the left of the "@" symbol, and place them in a list.
(So if [email protected] was in the file, 'galileo228' would be
added to the list.)


Any suggestions would be much appeciated!


You may want to use regexes for this. For every match, split on '@'
and take the first bit.

Note that the actual specification for email addresses is far more
than a single regex can handle. However, for almost every single case
out there nowadays, a regex will get what you need.

You can even capture the part as you find the regexps. AsJonathan mentions, finding RFC-compliant email addresses can be ahairy/intractable problem. But you can get a pretty closeapproximation:


  import re

  r = re.compile(r'([-\w._+]+)@(?:[-\w]+\.)+(?:\w{2,5})', re.I)
  #                                        ^
  # if you want to allow local domains like
  #   u...@localhost
  # then change the "+" marked with the "^"
  # to a "*" and the "{2,5}" to "+" to unlimit
  # the TLD.  This will change the outcome
  # of the last test "j...@com" to True

  for test, expected in (
      ('[email protected]', True),
      ('[email protected]', True),
      ('@example.com', False),
      ('@sub.example.com', False),
      ('@com', False),
      ('j...@com', False),
      ):
    m = r.match(test)
    if bool(m) ^ expected:
      print "Failed: %r should be %s" % (test, expected)

  emails = set()
  for line in file('test.txt'):
    for match in r.finditer(line):
      emails.add(match.group(1))
  print "All the emails:",
  print ', '.join(emails)

-tkc






--
http://mail.python.org/mailman/listinfo/python-list

Re: Parsing for email addresses

Reply via email to