But, of course, that's not an issue in The Clearly Superior Language,Ok, if this thread has accomplished little else, it seems to have gotten a couple people, including myself to play around with Python.
is it?
I have a simple little perl program at work. It parses a mailbox format file and extracts email addresses from it. The mailbox file is a bunch of bounced email notifications. The program uses a couple regular expressions to extract the data and a hash to hold a unique list of the emails. It then connects to a database to see if those people have already been flagged as having bad emails.
I made a copy of this script, removed the database stuff, and stripped it down to the basic process of read the file, find emails excluding some admin type addresses and then print out the unique bounced addresses. Then I wrote a Python version. It didn't take very long to code, about the same time to write the perl code if I take out the time reading howto's and the language reference.
$ wc bademail.mai 8414 29296 419796 bademail.mai
Perl: parseContacts.pl ---------------------------------------------------------- my ($unique_emails,$total_emails); $unique_emails = 0; $total_emails = 0;
my %badHash;
my $filename = 'bademail.mai'; open(BADEML,$filename) or die "Error opening $filename"; while($isline=<BADEML>) {
if($isline =~ /([EMAIL PROTECTED])/) { $eml = lc($1); next if($eml =~ /(postmaster|mailer-daemon)/i); $badHash{$eml} = 'bad'; $total_emails++; }
}
close(BADEML);
foreach $bademail (sort keys(%badHash)) { print "$bademail, $badHash{$bademail}\n"; $unique_emails++; }
print "$filename: total bounced emails ($total_emails); unique ($unique_emails)\
n";
--------------------------------
$ time perl parseContacts.pl
[...snip print email statements...]
bademail.mai: total bounced emails (888); unique (217)
real 0m0.056s user 0m0.051s sys 0m0.004s
Python: parseContacts.py ------------------------------- import sys, re, string
unique_emails = 0 total_emails = 0 badHash = {} file = "bademail.mai" try: fi = open(file,"r") print "Tryied to open "+file except (IOError, OSError): print "Trouble opening "+file sys.exit(0)
email_pattern = re.compile("([EMAIL PROTECTED])") ignore_pattern = re.compile("(postmaster|mailer-daemon)",re.I)
for line in fi.readlines():
email_mo = email_pattern.search(line)
if email_mo: ignore_mo = ignore_pattern.search(email_mo.group(1))
if not ignore_mo:
badHash[string.lower(email_mo.group(1))] = "bad"
total_emails = total_emails + 1
fi.close();
for badmail in badHash.keys(): print "%s, %s" % (badmail, badHash[badmail]) unique_emails = unique_emails + 1
print "%s: total bounced emails (%d); unique (%d)"% \ (file, total_emails, unique_emails)
-------------------------------------- $ time python parseContacts.py [...snip print email statements...] bademail.mai: total bounced emails (888); unique (217)
real 0m0.839s user 0m0.818s sys 0m0.008s
I was able to write both versions and they find the same emails. I guess I'll have to look at both scripts in six months to see which one I can still understand. I purposely left out all commenting I would normally do to see how well the code stood on it's own (but that is _very_ against my normal practice.)
What did I do wrong to make the python code take over ten times as long?
I thought the compiled regexp patterns were suppose to be an advantage over non compiled ones (although I believe Perl caches the compiled regexp if it's contents don't change).
Is string.tolower() slower than perl's lc()?
Am I using an inferior method of reading a line of text from the file? (I'm instructing Perl to read one line at a time as well, and not gobble the whole thing in one go.)
If I cut the file down a bit the results are still over 10x apart: $ wc bademail.mai 4725 11919 282057 bademail.mai Perl: bademail.mai: total bounced emails (313); unique (68)
real 0m0.030s user 0m0.025s sys 0m0.006s Python: bademail.mai: total bounced emails (313); unique (68)
real 0m0.532s user 0m0.521s sys 0m0.010s
$ wc bademail.mai 58 282 2099 bademail.mai Perl: bademail.mai: total bounced emails (6); unique (2)
real 0m0.009s user 0m0.004s sys 0m0.006s Python: bademail.mai: total bounced emails (6); unique (2)
real 0m0.044s user 0m0.035s sys 0m0.008s
I thought all that talk about speeding up python using those other programs or whatever they were was to get it on par with compiled code like C. Do I need to use them to get my Python code to run as fast as my Perl code, or am I doing something grossly incorrect? I used python, python1.5, python2 and python2.2 and they gave practically the same results.
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]