Pete Emerson wrote:
I've written my first python program, and would love suggestions for
improvement.

I'm a perl programmer and used a perl version of this program to guide
me. So in that sense, the python is "perlesque"

This script parses /etc/hosts for hostnames, and based on terms given
on the command line (argv), either prints the list of hostnames that
match all the criteria, or uses ssh to connect to the host if the
number of matches is unique.

I am looking for advice along the lines of "an easier way to do this"
or "a more python way" (I'm sure that's asking for trouble!) or
"people commonly do this instead" or "here's a slick trick" or "oh,
interesting, here's my version to do the same thing".

I am aware that there are performance improvements and error checking
that could be made, such as making sure the file exists and is
readable and precompiling the regular expressions and not calculating
how many sys.argv arguments there are more than once. I'm not hyper
concerned with performance or idiot proofing for this particular
script.

Thanks in advance.

########################################################
#!/usr/bin/python

import sys, fileinput, re, os

filename = '/etc/hosts'

hosts = []

for line in open(filename, 'r'):
        match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)

Use 'raw' strings for regular expressions.

'Normal' Python string literals use backslashes in escape sequences (eg,
'\n'), but so do regular expressions, and regular expressions are passed
to the 're' module in strings, so that can lead to a profusion of
backslashes!

You could either escape the backslashes:

        match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line)

or use a raw string:

        match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line)

A raw string literal is just like a normal string literal, except that
backslashes aren't special.

        if match is None or re.search('^(?:float|localhost)\.', line):
continue

It would be more 'Pythonic' to say "not match".

        if not match or re.search(r'^(?:float|localhost)\.', line):

        hostname = match.group(1)
        count = 0
        for arg in sys.argv[1:]:
                for section in hostname.split('.'):
                        if section == arg:
                                count = count + 1

Shorter is:

                                count += 1

                                break
        if count == len(sys.argv) - 1:
                hosts.append(hostname)

if len(hosts) == 1:
        os.system("ssh -A " + hosts[0])
else:
        print '\n'.join(hosts)

You're splitting the hostname repeatedly. You could split it just once,
outside the outer loop, and maybe make it into a set. You could then
have:

        host_parts = set(hostname.split('.'))
        count = 0
        for arg in sys.argv[1:]:
                if arg in host_parts:
                        count += 1

Incidentally, the re module contains a cache, so the regular expressions
won't be recompiled every time (unless you have a lot of them and the re
module flushes the cache).
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to