Re: Evaluate my first python script, please

2010-03-05 Thread Pete Emerson
On Mar 5, 10:19 am, "sjdevn...@yahoo.com"  wrote:
> On Mar 5, 10:53 am, Pete Emerson  wrote:
>
>
>
>
>
> > Thanks for your response, further questions inline.
>
> > On Mar 4, 11:07 am, Tim Wintle  wrote:
>
> > > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote:
> > > > I am looking for advice along the lines of "an easier way to do this"
> > > > or "a more python way" (I'm sure that's asking for trouble!) or
> > > > "people commonly do this instead" or "here's a slick trick" or "oh,
> > > > interesting, here's my version to do the same thing".
>
> > > (1) I would wrap it all in a function
>
> > > def main():
> > >     # your code here
>
> > > if __name__ == "__main__":
> > >     main()
>
> > Is this purely aesthetic reasons, or will I appreciate this when I
> > write my own modules, or something else?
>
> Suppose the above code is in mymodule.py.  By wrapping main() you can:
> 1. Have another module do:
> import mymodule
> ... (so some stuff, perhaps munge sys.argv)
> mymodule.main()
> 2. If mymodule has a small function in it, someone else can import it
> and call that function
> 3. You can run pylint, pychecker and other source-code checkers that
> need to be able to import your module to check it (I wouldn't be
> surprised if recent versions of one or the other of those don't
> require imports, and some checkers like pyflakes certainly don't).
> 4. You can easily have a unit tester call into the module
>
> etc.
>
> > > (2) PEP8 (python style guidelines) suggests one import per line
>
> > > (3) I'd use four spaces as tab width
>
> +1 on both; it's good to get into the habit of writing standard-
> looking Python code.

Agreed, noted, and appreciated, with the caveat that using spaces
instead of tabs might border on an emacs vs. vi flamewar in some
circles. I personally will use spaces going forward.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread sjdevn...@yahoo.com
On Mar 5, 10:53 am, Pete Emerson  wrote:
> Thanks for your response, further questions inline.
>
> On Mar 4, 11:07 am, Tim Wintle  wrote:
>
> > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote:
> > > I am looking for advice along the lines of "an easier way to do this"
> > > or "a more python way" (I'm sure that's asking for trouble!) or
> > > "people commonly do this instead" or "here's a slick trick" or "oh,
> > > interesting, here's my version to do the same thing".
>
> > (1) I would wrap it all in a function
>
> > def main():
> >     # your code here
>
> > if __name__ == "__main__":
> >     main()
>
> Is this purely aesthetic reasons, or will I appreciate this when I
> write my own modules, or something else?

Suppose the above code is in mymodule.py.  By wrapping main() you can:
1. Have another module do:
import mymodule
... (so some stuff, perhaps munge sys.argv)
mymodule.main()
2. If mymodule has a small function in it, someone else can import it
and call that function
3. You can run pylint, pychecker and other source-code checkers that
need to be able to import your module to check it (I wouldn't be
surprised if recent versions of one or the other of those don't
require imports, and some checkers like pyflakes certainly don't).
4. You can easily have a unit tester call into the module

etc.

> > (2) PEP8 (python style guidelines) suggests one import per line
>
> > (3) I'd use four spaces as tab width

+1 on both; it's good to get into the habit of writing standard-
looking Python code.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Tim Wintle
On Fri, 2010-03-05 at 07:53 -0800, Pete Emerson wrote:
> Thanks for your response, further questions inline.
> 
> On Mar 4, 11:07 am, Tim Wintle  wrote:
> > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote:
> > > I am looking for advice along the lines of "an easier way to do this"
> > > or "a more python way" (I'm sure that's asking for trouble!) or
> > > "people commonly do this instead" or "here's a slick trick" or "oh,
> > > interesting, here's my version to do the same thing".
> >
> > (1) I would wrap it all in a function
> >
> > def main():
> > # your code here
> >
> > if __name__ == "__main__":
> > main()
> 
> Is this purely aesthetic reasons, or will I appreciate this when I
> write my own modules, or something else?

It's for when you reuse this code. 

Consider it's in "mymodule.py" (so run with ./mymodule.py) - if you then
make a "tests.py" (for example) you can "import mymodule" without it
automatically running your code.

re-writing it

def main(args):
#your code

if __name__ == "__main__":
main(sys.argv[1:])

would obviously be more sensible for actually writing tests.


> > ... or you could replace whole section between the for loop and
> > hosts.append with:
> >
> > if sorted(hostname.split(".")) == sorted(sys.argv[1:]):
> > host.append(hostname)
> 
> This doesn't actually work, because I'm not looking for a one to one
> mapping of args to sections of hostname, but rather a subset. So
> passing in 'prod sfo' would register a match for '001.webapp.prod.sfo'.

Ah - good point - I guess the the set intersection technique someone
else mentioned is best in that case.

Tim

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Pete Emerson
On Mar 5, 7:00 am, Duncan Booth  wrote:
> Jean-Michel Pichavant  wrote:
> > And tell me how not using regexp will ensure the /etc/hosts processing
> > is correct ? The non regexp solutions provided in this thread did not
> > handled what you rightfully pointed out about host list and commented
> > lines.
>
> It won't make is automatically correct, but I'd guess that written without
> being so dependent on regexes might have made someone point out those
> deficiencies sooner. The point being that casual readers of the code won't
> take the time to decode the regex, they'll glance over it and assume it
> does something or other sensible.
>
> If I was writing that code, I'd read each line, strip off comments and
> leading whitespace (so you can use re.match instead of re.search), split on
> whitespace and take all but the first field. I might check that the field
> I'm ignoring it something like a numeric ip address, but if I did want to
> do then I'd include range checking for valid octets so still no regex.
>
> The whole of that I'd wrap in a generator so what you get back is a
> sequence of host names.
>
> However that's just me. I'm not averse to regular expressions, I've written
> some real mammoths from time to time, but I do avoid them when there are
> simpler clearer alternatives.
>
> > And FYI, the OP pattern does match '192.168.200.1 (foo123)'
> > ...
> > Ok that's totally unfair :D You're right I made a mistake.  Still the
> > comment is absolutely required (provided it's correct).
>
> Yes, the comment would have been good had it been correct. I'd also go for
> a named group as that provides additional context within the regex.
>
> Also if there are several similar regular expressions in the code, or if
> they get too complex I'd build them up in parts. e.g.
>
> OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])'
> ADDRESS = (OCTET + r'\.') * 3 + OCTET
> HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*'
>   # could use \S+ but my Linux manual says
>   # alphanumeric, dash and dots only
> ... and so on ...
>
> which provides another way of documenting the intentions of the regex.
>
> BTW, I'm not advocating that here, the above patterns would be overkill,
> but in more complex situations thats what I'd do.
>
> --
> Duncan Boothhttp://kupuguy.blogspot.com

All good comments here. The takeaway for my lazy style of regexes
(which makes it harder for non-regex fiends to read, regardless of the
language) is that there are ways to make regexes much more readable to
the untrained eye. Duncan, I like your method of defining sections of
the regex outside the regex itself, even if it's a one time use.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Pete Emerson
Thanks for your response, further questions inline.

On Mar 4, 11:07 am, Tim Wintle  wrote:
> On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote:
> > I am looking for advice along the lines of "an easier way to do this"
> > or "a more python way" (I'm sure that's asking for trouble!) or
> > "people commonly do this instead" or "here's a slick trick" or "oh,
> > interesting, here's my version to do the same thing".
>
> (1) I would wrap it all in a function
>
> def main():
>     # your code here
>
> if __name__ == "__main__":
>     main()

Is this purely aesthetic reasons, or will I appreciate this when I
write my own modules, or something else?

>
> (2) PEP8 (python style guidelines) suggests one import per line
>
> (3) I'd use four spaces as tab width
>
> (4)
> I'd change this:
>
> >     for arg in sys.argv[1:]:
> >         for section in hostname.split('.'):
> >             if section == arg:
> >                 count = count + 1
> >                 break
>
> to something more like:
>
>     for section in hostname.split("."):
>         if section in sys.argv[1:]:
>             count += 1

Ah, yes, I like that. It moves towards the set notation I've wound up
with. Definitely more readable to me.

>
> (although as you suggested I'd only calculate sys.argv[1:] once)
>
> ... or you could replace whole section between the for loop and
> hosts.append with:
>
>     if sorted(hostname.split(".")) == sorted(sys.argv[1:]):
>         host.append(hostname)

This doesn't actually work, because I'm not looking for a one to one
mapping of args to sections of hostname, but rather a subset. So
passing in 'prod sfo' would register a match for '001.webapp.prod.sfo'.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Duncan Booth
Jean-Michel Pichavant  wrote:

> And tell me how not using regexp will ensure the /etc/hosts processing
> is correct ? The non regexp solutions provided in this thread did not 
> handled what you rightfully pointed out about host list and commented
> lines. 

It won't make is automatically correct, but I'd guess that written without 
being so dependent on regexes might have made someone point out those 
deficiencies sooner. The point being that casual readers of the code won't 
take the time to decode the regex, they'll glance over it and assume it 
does something or other sensible.

If I was writing that code, I'd read each line, strip off comments and 
leading whitespace (so you can use re.match instead of re.search), split on 
whitespace and take all but the first field. I might check that the field 
I'm ignoring it something like a numeric ip address, but if I did want to 
do then I'd include range checking for valid octets so still no regex.

The whole of that I'd wrap in a generator so what you get back is a 
sequence of host names.

However that's just me. I'm not averse to regular expressions, I've written 
some real mammoths from time to time, but I do avoid them when there are 
simpler clearer alternatives.

> And FYI, the OP pattern does match '192.168.200.1 (foo123)'
> ...
> Ok that's totally unfair :D You're right I made a mistake.  Still the 
> comment is absolutely required (provided it's correct).
> 
Yes, the comment would have been good had it been correct. I'd also go for 
a named group as that provides additional context within the regex.

Also if there are several similar regular expressions in the code, or if 
they get too complex I'd build them up in parts. e.g.

OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])'
ADDRESS = (OCTET + r'\.') * 3 + OCTET
HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*'
  # could use \S+ but my Linux manual says
  # alphanumeric, dash and dots only
... and so on ...

which provides another way of documenting the intentions of the regex.

BTW, I'm not advocating that here, the above patterns would be overkill, 
but in more complex situations thats what I'd do.

-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Jean-Michel Pichavant

Duncan Booth wrote:

Jean-Michel Pichavant  wrote:

  

You've already been given good advices.
Unlike some of those, I don't think using regexp an issue in any way. 
Yes they are not readable, for sure, I 100% agree to that statement, but 
you can make them readable very easily.


# not readable
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)


# readable
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # match 
'192.168.200.1   (foo123)'


I gladly understand ppl that are not willing to use regexp (or as last 
resort), but for a perl guy, that should not be an issue.



The problem with your comment is that just by existing it means people will 
be misled into thinking the regex is being used to match strings like

  '192.168.200.1   (foo123)'
when in fact the OP is expecting to match strings like
  '192.168.200.1   foo123'

i.e. you may mislead some people into thinking the parentheses are part of 
what is being matched when they're actually grouping within the regex.


Another problem with either regular expression line is that it makes it 
less easy to see that it doesn't do what the OP wanted. /etc/hosts contains  
lines with an IP address followed by multiple hostnames. It may also 
contain comments either as complete lines or at the end of lines. The code 
given will miss hostnames after the first and will include hostnames in 
commented out lines.


  
And tell me how not using regexp will ensure the /etc/hosts processing 
is correct ? The non regexp solutions provided in this thread did not 
handled what you rightfully pointed out about host list and commented lines.


And FYI, the OP pattern does match '192.168.200.1 (foo123)'
...
Ok that's totally unfair :D You're right I made a mistake.  Still the 
comment is absolutely required (provided it's correct).


JM
--
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Duncan Booth
Jean-Michel Pichavant  wrote:

> You've already been given good advices.
> Unlike some of those, I don't think using regexp an issue in any way. 
> Yes they are not readable, for sure, I 100% agree to that statement, but 
> you can make them readable very easily.
> 
> # not readable
> match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
> 
> 
> # readable
> match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # match 
> '192.168.200.1   (foo123)'
> 
> I gladly understand ppl that are not willing to use regexp (or as last 
> resort), but for a perl guy, that should not be an issue.

The problem with your comment is that just by existing it means people will 
be misled into thinking the regex is being used to match strings like
  '192.168.200.1   (foo123)'
when in fact the OP is expecting to match strings like
  '192.168.200.1   foo123'

i.e. you may mislead some people into thinking the parentheses are part of 
what is being matched when they're actually grouping within the regex.

Another problem with either regular expression line is that it makes it 
less easy to see that it doesn't do what the OP wanted. /etc/hosts contains  
lines with an IP address followed by multiple hostnames. It may also 
contain comments either as complete lines or at the end of lines. The code 
given will miss hostnames after the first and will include hostnames in 
commented out lines.

-- 
Duncan Booth http://kupuguy.blogspot.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Jean-Michel Pichavant

Pete Emerson wrote:

I've written my first python program, and would love suggestions for
improvement.

I'm a perl programmer and used a perl version of this program to guide
me. So in that sense, the python is "perlesque"

This script parses /etc/hosts for hostnames, and based on terms given
on the command line (argv), either prints the list of hostnames that
match all the criteria, or uses ssh to connect to the host if the
number of matches is unique.

I am looking for advice along the lines of "an easier way to do this"
or "a more python way" (I'm sure that's asking for trouble!) or
"people commonly do this instead" or "here's a slick trick" or "oh,
interesting, here's my version to do the same thing".

I am aware that there are performance improvements and error checking
that could be made, such as making sure the file exists and is
readable and precompiling the regular expressions and not calculating
how many sys.argv arguments there are more than once. I'm not hyper
concerned with performance or idiot proofing for this particular
script.

Thanks in advance.


#!/usr/bin/python

import sys, fileinput, re, os

filename = '/etc/hosts'

hosts = []

for line in open(filename, 'r'):
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
if match is None or re.search('^(?:float|localhost)\.', line):
continue
hostname = match.group(1)
count = 0
for arg in sys.argv[1:]:
for section in hostname.split('.'):
if section == arg:
count = count + 1
break
if count == len(sys.argv) - 1:
hosts.append(hostname)

if len(hosts) == 1:
os.system("ssh -A " + hosts[0])
else:
print '\n'.join(hosts)
  

You've already been given good advices.
Unlike some of those, I don't think using regexp an issue in any way. 
Yes they are not readable, for sure, I 100% agree to that statement, but 
you can make them readable very easily.


# not readable
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)


# readable
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # match 
'192.168.200.1   (foo123)'


I gladly understand ppl that are not willing to use regexp (or as last 
resort), but for a perl guy, that should not be an issue.


JM

--
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-05 Thread Tim Wintle
On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote:
> I am looking for advice along the lines of "an easier way to do this"
> or "a more python way" (I'm sure that's asking for trouble!) or
> "people commonly do this instead" or "here's a slick trick" or "oh,
> interesting, here's my version to do the same thing".

(1) I would wrap it all in a function

def main():
# your code here

if __name__ == "__main__":
main()

(2) PEP8 (python style guidelines) suggests one import per line

(3) I'd use four spaces as tab width

(4) 
I'd change this:

> for arg in sys.argv[1:]:
> for section in hostname.split('.'):
> if section == arg:
> count = count + 1
> break

to something more like:

for section in hostname.split("."):
if section in sys.argv[1:]:
count += 1

(although as you suggested I'd only calculate sys.argv[1:] once)

... or you could replace whole section between the for loop and
hosts.append with:

if sorted(hostname.split(".")) == sorted(sys.argv[1:]):
host.append(hostname)


, at a slight loss of clarity - but I think I'd stick with the more
verbose version personally.

Tim



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-04 Thread Pete Emerson
Great responses, thank you all very much. I read Jonathan Gardner's
solution first and investigated sets. It's clearly superior to my
first cut.

I love the comment about regular expressions. In perl, I've reached
for regexes WAY too much. That's a big lesson learned too, and from my
point of view not because of performance (although that's most likely
a bonus) but because of code readability.

Here's the version I have now. It looks quite similar to sjdevnull's.
Further comments appreciated, and thanks again.

#!/usr/bin/env python

import sys, fileinput, os

filename = '/etc/hosts'

hosts = []

search_terms = set(sys.argv[1:])

for line in open(filename, 'r'):
if line.startswith('#'): continue
try:
hostname = line.strip().split('\t')[2]  # The 
host I want is always
at the 3rd tab.
except IndexError:
continue
if hostname.startswith('float.') or
hostname.startswith('localhost.'):
continue
if search_terms <= set(hostname.split('.')):  # same as if
search_terms.issubset(hostname.split('.')):
hosts.append(hostname)

if len(hosts) == 1:
os.execl("/usr/bin/ssh", '-A', hosts[0])
else:
for host in hosts:
print host

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-04 Thread sjdevn...@yahoo.com
On Mar 4, 1:39 pm, Pete Emerson  wrote:
> I've written my first python program, and would love suggestions for
> improvement.
>
> I'm a perl programmer and used a perl version of this program to guide
> me. So in that sense, the python is "perlesque"
>
> This script parses /etc/hosts for hostnames, and based on terms given
> on the command line (argv), either prints the list of hostnames that
> match all the criteria, or uses ssh to connect to the host if the
> number of matches is unique.
>
> I am looking for advice along the lines of "an easier way to do this"
> or "a more python way" (I'm sure that's asking for trouble!) or
> "people commonly do this instead" or "here's a slick trick" or "oh,
> interesting, here's my version to do the same thing".
>
> I am aware that there are performance improvements and error checking
> that could be made, such as making sure the file exists and is
> readable and precompiling the regular expressions and not calculating
> how many sys.argv arguments there are more than once. I'm not hyper
> concerned with performance or idiot proofing for this particular
> script.
>
> Thanks in advance.
>
> 
> #!/usr/bin/python
>
> import sys, fileinput, re, os

'Some people, when confronted with a problem, think "I know, I’ll use
regular expressions." Now they have two problems.' — Jamie Zawinski

Seriously, regexes can be very useful but there's no need for them
here.  Simpler is usually better, and easier to understand.

> filename = '/etc/hosts'
>
> hosts = []
>
> for line in open(filename, 'r'):
>         match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>         if match is None or re.search('^(?:float|localhost)\.', line):
> continue
>         hostname = match.group(1)

I find this much clearer without regexes:

try:
ip, hostname = line.strip().split(None, 1)
except IndexError:
continue
# I think this is equivalent to your re, but I'm not sure it's what
# you actually meant...
#if line.startswith("float.") or line.startswith("localhost."):
#continue
# I'm going with:
if hostname.startswith("float.") or hostname.startswith("localhost"):
continue


>         count = 0
>         for arg in sys.argv[1:]:
>                 for section in hostname.split('.'):
>                         if section == arg:
>                                 count = count + 1
>                                 break
>         if count == len(sys.argv) - 1:
>                 hosts.append(hostname)

A perfect application of sets.

#initialize at program outset
args = set(sys.argv[1:])
...
hostparts = set(hostname.split("."))
if hostparts & args:
hosts.append(hostname)


Full program:

import sys
import os
filename = '/etc/hosts'

hosts = []
args = set(sys.argv[1:])
for line in open(filename, 'r'):
# Parse line into ip address and hostname, skipping bogus lines
try:
ipaddr, hostname = line.strip().split(None, 1)
except ValueError:
continue
if hostname.startswith("float.") or
hostname.startswith("localhost."):
continue

# Add to hosts if it matches at least one argument
hostparts = set(hostname.split("."))
if hostparts & args:
hosts.append(hostname)

# If there's only one match, ssh to it--otherwise print out the
matches
if len(hosts) == 1:
os.system("ssh -A %s"%hosts[0])
else:
for host in hosts:
print host
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-04 Thread Jonathan Gardner
On Thu, Mar 4, 2010 at 10:39 AM, Pete Emerson  wrote:
>
> #!/usr/bin/python
>

More common:

#!/usr/bin/env python

> import sys, fileinput, re, os
>
> filename = '/etc/hosts'
>
> hosts = []
>
> for line in open(filename, 'r'):
>        match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>        if match is None or re.search('^(?:float|localhost)\.', line):
> continue
>        hostname = match.group(1)

In Python, we use REs as a last resort. See "dir("")" or "help("")"
for why we don't use REs if we can avoid them.

The above is better represented with:

try:
  ipaddr, hostnames = line.split(None, 1)
except IndexError:
  continue

if line.find('float') >=0 or line.find('localhost') >= 0:
  continue


>        count = 0
>        for arg in sys.argv[1:]:
>                for section in hostname.split('.'):
>                        if section == arg:
>                                count = count + 1
>                                break
>        if count == len(sys.argv) - 1:
>                hosts.append(hostname)
>

You can use the "else" in a "for". as well.

for arg in sys.argv[1:]:
 for section in hostname.split('.'):
 if section == arg:
 break
else: # Run only if for completes naturally
  hosts.append(hostname)


It may be clearer to do set arithmetic as well.

> if len(hosts) == 1:
>        os.system("ssh -A " + hosts[0])

You probably want one of the os.exec* methods instead, since you
aren't going to do anything after this.

> else:
>        print '\n'.join(hosts)

Rather, the idiom is usually:

for host in hosts:
  print host

-- 
Jonathan Gardner
jgard...@jonathangardner.net
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-04 Thread nn
On Mar 4, 2:30 pm, MRAB  wrote:
> Pete Emerson wrote:
> > I've written my first python program, and would love suggestions for
> > improvement.
>
> > I'm a perl programmer and used a perl version of this program to guide
> > me. So in that sense, the python is "perlesque"
>
> > This script parses /etc/hosts for hostnames, and based on terms given
> > on the command line (argv), either prints the list of hostnames that
> > match all the criteria, or uses ssh to connect to the host if the
> > number of matches is unique.
>
> > I am looking for advice along the lines of "an easier way to do this"
> > or "a more python way" (I'm sure that's asking for trouble!) or
> > "people commonly do this instead" or "here's a slick trick" or "oh,
> > interesting, here's my version to do the same thing".
>
> > I am aware that there are performance improvements and error checking
> > that could be made, such as making sure the file exists and is
> > readable and precompiling the regular expressions and not calculating
> > how many sys.argv arguments there are more than once. I'm not hyper
> > concerned with performance or idiot proofing for this particular
> > script.
>
> > Thanks in advance.
>
> > 
> > #!/usr/bin/python
>
> > import sys, fileinput, re, os
>
> > filename = '/etc/hosts'
>
> > hosts = []
>
> > for line in open(filename, 'r'):
> >    match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>
> Use 'raw' strings for regular expressions.
>
> 'Normal' Python string literals use backslashes in escape sequences (eg,
> '\n'), but so do regular expressions, and regular expressions are passed
> to the 're' module in strings, so that can lead to a profusion of
> backslashes!
>
> You could either escape the backslashes:
>
>         match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line)
>
> or use a raw string:
>
>         match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
>
> A raw string literal is just like a normal string literal, except that
> backslashes aren't special.
>
> >    if match is None or re.search('^(?:float|localhost)\.', line):
> > continue
>
> It would be more 'Pythonic' to say "not match".
>
>         if not match or re.search(r'^(?:float|localhost)\.', line):
>
> >    hostname = match.group(1)
> >    count = 0
> >    for arg in sys.argv[1:]:
> >            for section in hostname.split('.'):
> >                    if section == arg:
> >                            count = count + 1
>
> Shorter is:
>
>                                 count += 1
>
> >                            break
> >    if count == len(sys.argv) - 1:
> >            hosts.append(hostname)
>
> > if len(hosts) == 1:
> >    os.system("ssh -A " + hosts[0])
> > else:
> >    print '\n'.join(hosts)
>
> You're splitting the hostname repeatedly. You could split it just once,
> outside the outer loop, and maybe make it into a set. You could then
> have:
>
>          host_parts = set(hostname.split('.'))
>         count = 0
>         for arg in sys.argv[1:]:
>                 if arg in host_parts:
>                         count += 1
>
> Incidentally, the re module contains a cache, so the regular expressions
> won't be recompiled every time (unless you have a lot of them and the re
> module flushes the cache).

I think that you really just need a set intersection for the count
part and this should work:

host_parts = set(hostname.split('.'))
count = len(host_parts.intersection(sys.argv[1:]))

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Evaluate my first python script, please

2010-03-04 Thread MRAB

Pete Emerson wrote:

I've written my first python program, and would love suggestions for
improvement.

I'm a perl programmer and used a perl version of this program to guide
me. So in that sense, the python is "perlesque"

This script parses /etc/hosts for hostnames, and based on terms given
on the command line (argv), either prints the list of hostnames that
match all the criteria, or uses ssh to connect to the host if the
number of matches is unique.

I am looking for advice along the lines of "an easier way to do this"
or "a more python way" (I'm sure that's asking for trouble!) or
"people commonly do this instead" or "here's a slick trick" or "oh,
interesting, here's my version to do the same thing".

I am aware that there are performance improvements and error checking
that could be made, such as making sure the file exists and is
readable and precompiling the regular expressions and not calculating
how many sys.argv arguments there are more than once. I'm not hyper
concerned with performance or idiot proofing for this particular
script.

Thanks in advance.


#!/usr/bin/python

import sys, fileinput, re, os

filename = '/etc/hosts'

hosts = []

for line in open(filename, 'r'):
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)


Use 'raw' strings for regular expressions.

'Normal' Python string literals use backslashes in escape sequences (eg,
'\n'), but so do regular expressions, and regular expressions are passed
to the 're' module in strings, so that can lead to a profusion of
backslashes!

You could either escape the backslashes:

match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line)

or use a raw string:

match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line)

A raw string literal is just like a normal string literal, except that
backslashes aren't special.


if match is None or re.search('^(?:float|localhost)\.', line):
continue


It would be more 'Pythonic' to say "not match".

if not match or re.search(r'^(?:float|localhost)\.', line):


hostname = match.group(1)
count = 0
for arg in sys.argv[1:]:
for section in hostname.split('.'):
if section == arg:
count = count + 1


Shorter is:

count += 1


break
if count == len(sys.argv) - 1:
hosts.append(hostname)

if len(hosts) == 1:
os.system("ssh -A " + hosts[0])
else:
print '\n'.join(hosts)


You're splitting the hostname repeatedly. You could split it just once,
outside the outer loop, and maybe make it into a set. You could then
have:

host_parts = set(hostname.split('.'))
count = 0
for arg in sys.argv[1:]:
if arg in host_parts:
count += 1

Incidentally, the re module contains a cache, so the regular expressions
won't be recompiled every time (unless you have a lot of them and the re
module flushes the cache).
--
http://mail.python.org/mailman/listinfo/python-list


Evaluate my first python script, please

2010-03-04 Thread Pete Emerson
I've written my first python program, and would love suggestions for
improvement.

I'm a perl programmer and used a perl version of this program to guide
me. So in that sense, the python is "perlesque"

This script parses /etc/hosts for hostnames, and based on terms given
on the command line (argv), either prints the list of hostnames that
match all the criteria, or uses ssh to connect to the host if the
number of matches is unique.

I am looking for advice along the lines of "an easier way to do this"
or "a more python way" (I'm sure that's asking for trouble!) or
"people commonly do this instead" or "here's a slick trick" or "oh,
interesting, here's my version to do the same thing".

I am aware that there are performance improvements and error checking
that could be made, such as making sure the file exists and is
readable and precompiling the regular expressions and not calculating
how many sys.argv arguments there are more than once. I'm not hyper
concerned with performance or idiot proofing for this particular
script.

Thanks in advance.


#!/usr/bin/python

import sys, fileinput, re, os

filename = '/etc/hosts'

hosts = []

for line in open(filename, 'r'):
match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line)
if match is None or re.search('^(?:float|localhost)\.', line):
continue
hostname = match.group(1)
count = 0
for arg in sys.argv[1:]:
for section in hostname.split('.'):
if section == arg:
count = count + 1
break
if count == len(sys.argv) - 1:
hosts.append(hostname)

if len(hosts) == 1:
os.system("ssh -A " + hosts[0])
else:
print '\n'.join(hosts)
-- 
http://mail.python.org/mailman/listinfo/python-list