Jonathan Gardner wrote:
On Feb 15, 3:34 pm, galileo228 <mattbar...@gmail.com> wrote:
I'm trying to write python code that will open a textfile and find the
email addresses inside it. I then want the code to take just the
characters to the left of the "@" symbol, and place them in a list.
(So if galileo...@gmail.com was in the file, 'galileo228' would be
added to the list.)

Any suggestions would be much appeciated!


You may want to use regexes for this. For every match, split on '@'
and take the first bit.

Note that the actual specification for email addresses is far more
than a single regex can handle. However, for almost every single case
out there nowadays, a regex will get what you need.

You can even capture the part as you find the regexps. As Jonathan mentions, finding RFC-compliant email addresses can be a hairy/intractable problem. But you can get a pretty close approximation:

  import re

  r = re.compile(r'([-\w._+]+)@(?:[-\w]+\.)+(?:\w{2,5})', re.I)
  #                                        ^
  # if you want to allow local domains like
  #   u...@localhost
  # then change the "+" marked with the "^"
  # to a "*" and the "{2,5}" to "+" to unlimit
  # the TLD.  This will change the outcome
  # of the last test "j...@com" to True

  for test, expected in (
      ('j...@example.com', True),
      ('j...@sub.example.com', True),
      ('@example.com', False),
      ('@sub.example.com', False),
      ('@com', False),
      ('j...@com', False),
      ):
    m = r.match(test)
    if bool(m) ^ expected:
      print "Failed: %r should be %s" % (test, expected)

  emails = set()
  for line in file('test.txt'):
    for match in r.finditer(line):
      emails.add(match.group(1))
  print "All the emails:",
  print ', '.join(emails)

-tkc






--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to