Re: Evaluate my first python script, please
On Mar 5, 10:19 am, "sjdevn...@yahoo.com" wrote: > On Mar 5, 10:53 am, Pete Emerson wrote: > > > > > > > Thanks for your response, further questions inline. > > > On Mar 4, 11:07 am, Tim Wintle wrote: > > > > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote: > > > > I am looking for advice along the lines of "an easier way to do this" > > > > or "a more python way" (I'm sure that's asking for trouble!) or > > > > "people commonly do this instead" or "here's a slick trick" or "oh, > > > > interesting, here's my version to do the same thing". > > > > (1) I would wrap it all in a function > > > > def main(): > > > # your code here > > > > if __name__ == "__main__": > > > main() > > > Is this purely aesthetic reasons, or will I appreciate this when I > > write my own modules, or something else? > > Suppose the above code is in mymodule.py. By wrapping main() you can: > 1. Have another module do: > import mymodule > ... (so some stuff, perhaps munge sys.argv) > mymodule.main() > 2. If mymodule has a small function in it, someone else can import it > and call that function > 3. You can run pylint, pychecker and other source-code checkers that > need to be able to import your module to check it (I wouldn't be > surprised if recent versions of one or the other of those don't > require imports, and some checkers like pyflakes certainly don't). > 4. You can easily have a unit tester call into the module > > etc. > > > > (2) PEP8 (python style guidelines) suggests one import per line > > > > (3) I'd use four spaces as tab width > > +1 on both; it's good to get into the habit of writing standard- > looking Python code. Agreed, noted, and appreciated, with the caveat that using spaces instead of tabs might border on an emacs vs. vi flamewar in some circles. I personally will use spaces going forward. -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Mar 5, 10:53 am, Pete Emerson wrote: > Thanks for your response, further questions inline. > > On Mar 4, 11:07 am, Tim Wintle wrote: > > > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote: > > > I am looking for advice along the lines of "an easier way to do this" > > > or "a more python way" (I'm sure that's asking for trouble!) or > > > "people commonly do this instead" or "here's a slick trick" or "oh, > > > interesting, here's my version to do the same thing". > > > (1) I would wrap it all in a function > > > def main(): > > # your code here > > > if __name__ == "__main__": > > main() > > Is this purely aesthetic reasons, or will I appreciate this when I > write my own modules, or something else? Suppose the above code is in mymodule.py. By wrapping main() you can: 1. Have another module do: import mymodule ... (so some stuff, perhaps munge sys.argv) mymodule.main() 2. If mymodule has a small function in it, someone else can import it and call that function 3. You can run pylint, pychecker and other source-code checkers that need to be able to import your module to check it (I wouldn't be surprised if recent versions of one or the other of those don't require imports, and some checkers like pyflakes certainly don't). 4. You can easily have a unit tester call into the module etc. > > (2) PEP8 (python style guidelines) suggests one import per line > > > (3) I'd use four spaces as tab width +1 on both; it's good to get into the habit of writing standard- looking Python code. -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Fri, 2010-03-05 at 07:53 -0800, Pete Emerson wrote: > Thanks for your response, further questions inline. > > On Mar 4, 11:07 am, Tim Wintle wrote: > > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote: > > > I am looking for advice along the lines of "an easier way to do this" > > > or "a more python way" (I'm sure that's asking for trouble!) or > > > "people commonly do this instead" or "here's a slick trick" or "oh, > > > interesting, here's my version to do the same thing". > > > > (1) I would wrap it all in a function > > > > def main(): > > # your code here > > > > if __name__ == "__main__": > > main() > > Is this purely aesthetic reasons, or will I appreciate this when I > write my own modules, or something else? It's for when you reuse this code. Consider it's in "mymodule.py" (so run with ./mymodule.py) - if you then make a "tests.py" (for example) you can "import mymodule" without it automatically running your code. re-writing it def main(args): #your code if __name__ == "__main__": main(sys.argv[1:]) would obviously be more sensible for actually writing tests. > > ... or you could replace whole section between the for loop and > > hosts.append with: > > > > if sorted(hostname.split(".")) == sorted(sys.argv[1:]): > > host.append(hostname) > > This doesn't actually work, because I'm not looking for a one to one > mapping of args to sections of hostname, but rather a subset. So > passing in 'prod sfo' would register a match for '001.webapp.prod.sfo'. Ah - good point - I guess the the set intersection technique someone else mentioned is best in that case. Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Mar 5, 7:00 am, Duncan Booth wrote: > Jean-Michel Pichavant wrote: > > And tell me how not using regexp will ensure the /etc/hosts processing > > is correct ? The non regexp solutions provided in this thread did not > > handled what you rightfully pointed out about host list and commented > > lines. > > It won't make is automatically correct, but I'd guess that written without > being so dependent on regexes might have made someone point out those > deficiencies sooner. The point being that casual readers of the code won't > take the time to decode the regex, they'll glance over it and assume it > does something or other sensible. > > If I was writing that code, I'd read each line, strip off comments and > leading whitespace (so you can use re.match instead of re.search), split on > whitespace and take all but the first field. I might check that the field > I'm ignoring it something like a numeric ip address, but if I did want to > do then I'd include range checking for valid octets so still no regex. > > The whole of that I'd wrap in a generator so what you get back is a > sequence of host names. > > However that's just me. I'm not averse to regular expressions, I've written > some real mammoths from time to time, but I do avoid them when there are > simpler clearer alternatives. > > > And FYI, the OP pattern does match '192.168.200.1 (foo123)' > > ... > > Ok that's totally unfair :D You're right I made a mistake. Still the > > comment is absolutely required (provided it's correct). > > Yes, the comment would have been good had it been correct. I'd also go for > a named group as that provides additional context within the regex. > > Also if there are several similar regular expressions in the code, or if > they get too complex I'd build them up in parts. e.g. > > OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])' > ADDRESS = (OCTET + r'\.') * 3 + OCTET > HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*' > # could use \S+ but my Linux manual says > # alphanumeric, dash and dots only > ... and so on ... > > which provides another way of documenting the intentions of the regex. > > BTW, I'm not advocating that here, the above patterns would be overkill, > but in more complex situations thats what I'd do. > > -- > Duncan Boothhttp://kupuguy.blogspot.com All good comments here. The takeaway for my lazy style of regexes (which makes it harder for non-regex fiends to read, regardless of the language) is that there are ways to make regexes much more readable to the untrained eye. Duncan, I like your method of defining sections of the regex outside the regex itself, even if it's a one time use. -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Thanks for your response, further questions inline. On Mar 4, 11:07 am, Tim Wintle wrote: > On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote: > > I am looking for advice along the lines of "an easier way to do this" > > or "a more python way" (I'm sure that's asking for trouble!) or > > "people commonly do this instead" or "here's a slick trick" or "oh, > > interesting, here's my version to do the same thing". > > (1) I would wrap it all in a function > > def main(): > # your code here > > if __name__ == "__main__": > main() Is this purely aesthetic reasons, or will I appreciate this when I write my own modules, or something else? > > (2) PEP8 (python style guidelines) suggests one import per line > > (3) I'd use four spaces as tab width > > (4) > I'd change this: > > > for arg in sys.argv[1:]: > > for section in hostname.split('.'): > > if section == arg: > > count = count + 1 > > break > > to something more like: > > for section in hostname.split("."): > if section in sys.argv[1:]: > count += 1 Ah, yes, I like that. It moves towards the set notation I've wound up with. Definitely more readable to me. > > (although as you suggested I'd only calculate sys.argv[1:] once) > > ... or you could replace whole section between the for loop and > hosts.append with: > > if sorted(hostname.split(".")) == sorted(sys.argv[1:]): > host.append(hostname) This doesn't actually work, because I'm not looking for a one to one mapping of args to sections of hostname, but rather a subset. So passing in 'prod sfo' would register a match for '001.webapp.prod.sfo'. -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Jean-Michel Pichavant wrote: > And tell me how not using regexp will ensure the /etc/hosts processing > is correct ? The non regexp solutions provided in this thread did not > handled what you rightfully pointed out about host list and commented > lines. It won't make is automatically correct, but I'd guess that written without being so dependent on regexes might have made someone point out those deficiencies sooner. The point being that casual readers of the code won't take the time to decode the regex, they'll glance over it and assume it does something or other sensible. If I was writing that code, I'd read each line, strip off comments and leading whitespace (so you can use re.match instead of re.search), split on whitespace and take all but the first field. I might check that the field I'm ignoring it something like a numeric ip address, but if I did want to do then I'd include range checking for valid octets so still no regex. The whole of that I'd wrap in a generator so what you get back is a sequence of host names. However that's just me. I'm not averse to regular expressions, I've written some real mammoths from time to time, but I do avoid them when there are simpler clearer alternatives. > And FYI, the OP pattern does match '192.168.200.1 (foo123)' > ... > Ok that's totally unfair :D You're right I made a mistake. Still the > comment is absolutely required (provided it's correct). > Yes, the comment would have been good had it been correct. I'd also go for a named group as that provides additional context within the regex. Also if there are several similar regular expressions in the code, or if they get too complex I'd build them up in parts. e.g. OCTET = r'(?:\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])' ADDRESS = (OCTET + r'\.') * 3 + OCTET HOSTNAME = r'[-a-zA-Z0-9]+(?:\.[-a-zA-Z0-9]+)*' # could use \S+ but my Linux manual says # alphanumeric, dash and dots only ... and so on ... which provides another way of documenting the intentions of the regex. BTW, I'm not advocating that here, the above patterns would be overkill, but in more complex situations thats what I'd do. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Duncan Booth wrote: Jean-Michel Pichavant wrote: You've already been given good advices. Unlike some of those, I don't think using regexp an issue in any way. Yes they are not readable, for sure, I 100% agree to that statement, but you can make them readable very easily. # not readable match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # readable match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # match '192.168.200.1 (foo123)' I gladly understand ppl that are not willing to use regexp (or as last resort), but for a perl guy, that should not be an issue. The problem with your comment is that just by existing it means people will be misled into thinking the regex is being used to match strings like '192.168.200.1 (foo123)' when in fact the OP is expecting to match strings like '192.168.200.1 foo123' i.e. you may mislead some people into thinking the parentheses are part of what is being matched when they're actually grouping within the regex. Another problem with either regular expression line is that it makes it less easy to see that it doesn't do what the OP wanted. /etc/hosts contains lines with an IP address followed by multiple hostnames. It may also contain comments either as complete lines or at the end of lines. The code given will miss hostnames after the first and will include hostnames in commented out lines. And tell me how not using regexp will ensure the /etc/hosts processing is correct ? The non regexp solutions provided in this thread did not handled what you rightfully pointed out about host list and commented lines. And FYI, the OP pattern does match '192.168.200.1 (foo123)' ... Ok that's totally unfair :D You're right I made a mistake. Still the comment is absolutely required (provided it's correct). JM -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Jean-Michel Pichavant wrote: > You've already been given good advices. > Unlike some of those, I don't think using regexp an issue in any way. > Yes they are not readable, for sure, I 100% agree to that statement, but > you can make them readable very easily. > > # not readable > match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > > > # readable > match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # match > '192.168.200.1 (foo123)' > > I gladly understand ppl that are not willing to use regexp (or as last > resort), but for a perl guy, that should not be an issue. The problem with your comment is that just by existing it means people will be misled into thinking the regex is being used to match strings like '192.168.200.1 (foo123)' when in fact the OP is expecting to match strings like '192.168.200.1 foo123' i.e. you may mislead some people into thinking the parentheses are part of what is being matched when they're actually grouping within the regex. Another problem with either regular expression line is that it makes it less easy to see that it doesn't do what the OP wanted. /etc/hosts contains lines with an IP address followed by multiple hostnames. It may also contain comments either as complete lines or at the end of lines. The code given will miss hostnames after the first and will include hostnames in commented out lines. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Pete Emerson wrote: I've written my first python program, and would love suggestions for improvement. I'm a perl programmer and used a perl version of this program to guide me. So in that sense, the python is "perlesque" This script parses /etc/hosts for hostnames, and based on terms given on the command line (argv), either prints the list of hostnames that match all the criteria, or uses ssh to connect to the host if the number of matches is unique. I am looking for advice along the lines of "an easier way to do this" or "a more python way" (I'm sure that's asking for trouble!) or "people commonly do this instead" or "here's a slick trick" or "oh, interesting, here's my version to do the same thing". I am aware that there are performance improvements and error checking that could be made, such as making sure the file exists and is readable and precompiling the regular expressions and not calculating how many sys.argv arguments there are more than once. I'm not hyper concerned with performance or idiot proofing for this particular script. Thanks in advance. #!/usr/bin/python import sys, fileinput, re, os filename = '/etc/hosts' hosts = [] for line in open(filename, 'r'): match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) if match is None or re.search('^(?:float|localhost)\.', line): continue hostname = match.group(1) count = 0 for arg in sys.argv[1:]: for section in hostname.split('.'): if section == arg: count = count + 1 break if count == len(sys.argv) - 1: hosts.append(hostname) if len(hosts) == 1: os.system("ssh -A " + hosts[0]) else: print '\n'.join(hosts) You've already been given good advices. Unlike some of those, I don't think using regexp an issue in any way. Yes they are not readable, for sure, I 100% agree to that statement, but you can make them readable very easily. # not readable match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # readable match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) # match '192.168.200.1 (foo123)' I gladly understand ppl that are not willing to use regexp (or as last resort), but for a perl guy, that should not be an issue. JM -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Thu, 2010-03-04 at 10:39 -0800, Pete Emerson wrote: > I am looking for advice along the lines of "an easier way to do this" > or "a more python way" (I'm sure that's asking for trouble!) or > "people commonly do this instead" or "here's a slick trick" or "oh, > interesting, here's my version to do the same thing". (1) I would wrap it all in a function def main(): # your code here if __name__ == "__main__": main() (2) PEP8 (python style guidelines) suggests one import per line (3) I'd use four spaces as tab width (4) I'd change this: > for arg in sys.argv[1:]: > for section in hostname.split('.'): > if section == arg: > count = count + 1 > break to something more like: for section in hostname.split("."): if section in sys.argv[1:]: count += 1 (although as you suggested I'd only calculate sys.argv[1:] once) ... or you could replace whole section between the for loop and hosts.append with: if sorted(hostname.split(".")) == sorted(sys.argv[1:]): host.append(hostname) , at a slight loss of clarity - but I think I'd stick with the more verbose version personally. Tim -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Great responses, thank you all very much. I read Jonathan Gardner's solution first and investigated sets. It's clearly superior to my first cut. I love the comment about regular expressions. In perl, I've reached for regexes WAY too much. That's a big lesson learned too, and from my point of view not because of performance (although that's most likely a bonus) but because of code readability. Here's the version I have now. It looks quite similar to sjdevnull's. Further comments appreciated, and thanks again. #!/usr/bin/env python import sys, fileinput, os filename = '/etc/hosts' hosts = [] search_terms = set(sys.argv[1:]) for line in open(filename, 'r'): if line.startswith('#'): continue try: hostname = line.strip().split('\t')[2] # The host I want is always at the 3rd tab. except IndexError: continue if hostname.startswith('float.') or hostname.startswith('localhost.'): continue if search_terms <= set(hostname.split('.')): # same as if search_terms.issubset(hostname.split('.')): hosts.append(hostname) if len(hosts) == 1: os.execl("/usr/bin/ssh", '-A', hosts[0]) else: for host in hosts: print host -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Mar 4, 1:39 pm, Pete Emerson wrote: > I've written my first python program, and would love suggestions for > improvement. > > I'm a perl programmer and used a perl version of this program to guide > me. So in that sense, the python is "perlesque" > > This script parses /etc/hosts for hostnames, and based on terms given > on the command line (argv), either prints the list of hostnames that > match all the criteria, or uses ssh to connect to the host if the > number of matches is unique. > > I am looking for advice along the lines of "an easier way to do this" > or "a more python way" (I'm sure that's asking for trouble!) or > "people commonly do this instead" or "here's a slick trick" or "oh, > interesting, here's my version to do the same thing". > > I am aware that there are performance improvements and error checking > that could be made, such as making sure the file exists and is > readable and precompiling the regular expressions and not calculating > how many sys.argv arguments there are more than once. I'm not hyper > concerned with performance or idiot proofing for this particular > script. > > Thanks in advance. > > > #!/usr/bin/python > > import sys, fileinput, re, os 'Some people, when confronted with a problem, think "I know, I’ll use regular expressions." Now they have two problems.' — Jamie Zawinski Seriously, regexes can be very useful but there's no need for them here. Simpler is usually better, and easier to understand. > filename = '/etc/hosts' > > hosts = [] > > for line in open(filename, 'r'): > match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > if match is None or re.search('^(?:float|localhost)\.', line): > continue > hostname = match.group(1) I find this much clearer without regexes: try: ip, hostname = line.strip().split(None, 1) except IndexError: continue # I think this is equivalent to your re, but I'm not sure it's what # you actually meant... #if line.startswith("float.") or line.startswith("localhost."): #continue # I'm going with: if hostname.startswith("float.") or hostname.startswith("localhost"): continue > count = 0 > for arg in sys.argv[1:]: > for section in hostname.split('.'): > if section == arg: > count = count + 1 > break > if count == len(sys.argv) - 1: > hosts.append(hostname) A perfect application of sets. #initialize at program outset args = set(sys.argv[1:]) ... hostparts = set(hostname.split(".")) if hostparts & args: hosts.append(hostname) Full program: import sys import os filename = '/etc/hosts' hosts = [] args = set(sys.argv[1:]) for line in open(filename, 'r'): # Parse line into ip address and hostname, skipping bogus lines try: ipaddr, hostname = line.strip().split(None, 1) except ValueError: continue if hostname.startswith("float.") or hostname.startswith("localhost."): continue # Add to hosts if it matches at least one argument hostparts = set(hostname.split(".")) if hostparts & args: hosts.append(hostname) # If there's only one match, ssh to it--otherwise print out the matches if len(hosts) == 1: os.system("ssh -A %s"%hosts[0]) else: for host in hosts: print host -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Thu, Mar 4, 2010 at 10:39 AM, Pete Emerson wrote: > > #!/usr/bin/python > More common: #!/usr/bin/env python > import sys, fileinput, re, os > > filename = '/etc/hosts' > > hosts = [] > > for line in open(filename, 'r'): > match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > if match is None or re.search('^(?:float|localhost)\.', line): > continue > hostname = match.group(1) In Python, we use REs as a last resort. See "dir("")" or "help("")" for why we don't use REs if we can avoid them. The above is better represented with: try: ipaddr, hostnames = line.split(None, 1) except IndexError: continue if line.find('float') >=0 or line.find('localhost') >= 0: continue > count = 0 > for arg in sys.argv[1:]: > for section in hostname.split('.'): > if section == arg: > count = count + 1 > break > if count == len(sys.argv) - 1: > hosts.append(hostname) > You can use the "else" in a "for". as well. for arg in sys.argv[1:]: for section in hostname.split('.'): if section == arg: break else: # Run only if for completes naturally hosts.append(hostname) It may be clearer to do set arithmetic as well. > if len(hosts) == 1: > os.system("ssh -A " + hosts[0]) You probably want one of the os.exec* methods instead, since you aren't going to do anything after this. > else: > print '\n'.join(hosts) Rather, the idiom is usually: for host in hosts: print host -- Jonathan Gardner jgard...@jonathangardner.net -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
On Mar 4, 2:30 pm, MRAB wrote: > Pete Emerson wrote: > > I've written my first python program, and would love suggestions for > > improvement. > > > I'm a perl programmer and used a perl version of this program to guide > > me. So in that sense, the python is "perlesque" > > > This script parses /etc/hosts for hostnames, and based on terms given > > on the command line (argv), either prints the list of hostnames that > > match all the criteria, or uses ssh to connect to the host if the > > number of matches is unique. > > > I am looking for advice along the lines of "an easier way to do this" > > or "a more python way" (I'm sure that's asking for trouble!) or > > "people commonly do this instead" or "here's a slick trick" or "oh, > > interesting, here's my version to do the same thing". > > > I am aware that there are performance improvements and error checking > > that could be made, such as making sure the file exists and is > > readable and precompiling the regular expressions and not calculating > > how many sys.argv arguments there are more than once. I'm not hyper > > concerned with performance or idiot proofing for this particular > > script. > > > Thanks in advance. > > > > > #!/usr/bin/python > > > import sys, fileinput, re, os > > > filename = '/etc/hosts' > > > hosts = [] > > > for line in open(filename, 'r'): > > match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > > Use 'raw' strings for regular expressions. > > 'Normal' Python string literals use backslashes in escape sequences (eg, > '\n'), but so do regular expressions, and regular expressions are passed > to the 're' module in strings, so that can lead to a profusion of > backslashes! > > You could either escape the backslashes: > > match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line) > > or use a raw string: > > match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line) > > A raw string literal is just like a normal string literal, except that > backslashes aren't special. > > > if match is None or re.search('^(?:float|localhost)\.', line): > > continue > > It would be more 'Pythonic' to say "not match". > > if not match or re.search(r'^(?:float|localhost)\.', line): > > > hostname = match.group(1) > > count = 0 > > for arg in sys.argv[1:]: > > for section in hostname.split('.'): > > if section == arg: > > count = count + 1 > > Shorter is: > > count += 1 > > > break > > if count == len(sys.argv) - 1: > > hosts.append(hostname) > > > if len(hosts) == 1: > > os.system("ssh -A " + hosts[0]) > > else: > > print '\n'.join(hosts) > > You're splitting the hostname repeatedly. You could split it just once, > outside the outer loop, and maybe make it into a set. You could then > have: > > host_parts = set(hostname.split('.')) > count = 0 > for arg in sys.argv[1:]: > if arg in host_parts: > count += 1 > > Incidentally, the re module contains a cache, so the regular expressions > won't be recompiled every time (unless you have a lot of them and the re > module flushes the cache). I think that you really just need a set intersection for the count part and this should work: host_parts = set(hostname.split('.')) count = len(host_parts.intersection(sys.argv[1:])) -- http://mail.python.org/mailman/listinfo/python-list
Re: Evaluate my first python script, please
Pete Emerson wrote: I've written my first python program, and would love suggestions for improvement. I'm a perl programmer and used a perl version of this program to guide me. So in that sense, the python is "perlesque" This script parses /etc/hosts for hostnames, and based on terms given on the command line (argv), either prints the list of hostnames that match all the criteria, or uses ssh to connect to the host if the number of matches is unique. I am looking for advice along the lines of "an easier way to do this" or "a more python way" (I'm sure that's asking for trouble!) or "people commonly do this instead" or "here's a slick trick" or "oh, interesting, here's my version to do the same thing". I am aware that there are performance improvements and error checking that could be made, such as making sure the file exists and is readable and precompiling the regular expressions and not calculating how many sys.argv arguments there are more than once. I'm not hyper concerned with performance or idiot proofing for this particular script. Thanks in advance. #!/usr/bin/python import sys, fileinput, re, os filename = '/etc/hosts' hosts = [] for line in open(filename, 'r'): match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) Use 'raw' strings for regular expressions. 'Normal' Python string literals use backslashes in escape sequences (eg, '\n'), but so do regular expressions, and regular expressions are passed to the 're' module in strings, so that can lead to a profusion of backslashes! You could either escape the backslashes: match = re.search('\\d+\\.\\d+\\.\\d+\\.\\d+\s+(\\S+)', line) or use a raw string: match = re.search(r'\d+\.\d+\.\d+\.\d+\s+(\S+)', line) A raw string literal is just like a normal string literal, except that backslashes aren't special. if match is None or re.search('^(?:float|localhost)\.', line): continue It would be more 'Pythonic' to say "not match". if not match or re.search(r'^(?:float|localhost)\.', line): hostname = match.group(1) count = 0 for arg in sys.argv[1:]: for section in hostname.split('.'): if section == arg: count = count + 1 Shorter is: count += 1 break if count == len(sys.argv) - 1: hosts.append(hostname) if len(hosts) == 1: os.system("ssh -A " + hosts[0]) else: print '\n'.join(hosts) You're splitting the hostname repeatedly. You could split it just once, outside the outer loop, and maybe make it into a set. You could then have: host_parts = set(hostname.split('.')) count = 0 for arg in sys.argv[1:]: if arg in host_parts: count += 1 Incidentally, the re module contains a cache, so the regular expressions won't be recompiled every time (unless you have a lot of them and the re module flushes the cache). -- http://mail.python.org/mailman/listinfo/python-list
Evaluate my first python script, please
I've written my first python program, and would love suggestions for improvement. I'm a perl programmer and used a perl version of this program to guide me. So in that sense, the python is "perlesque" This script parses /etc/hosts for hostnames, and based on terms given on the command line (argv), either prints the list of hostnames that match all the criteria, or uses ssh to connect to the host if the number of matches is unique. I am looking for advice along the lines of "an easier way to do this" or "a more python way" (I'm sure that's asking for trouble!) or "people commonly do this instead" or "here's a slick trick" or "oh, interesting, here's my version to do the same thing". I am aware that there are performance improvements and error checking that could be made, such as making sure the file exists and is readable and precompiling the regular expressions and not calculating how many sys.argv arguments there are more than once. I'm not hyper concerned with performance or idiot proofing for this particular script. Thanks in advance. #!/usr/bin/python import sys, fileinput, re, os filename = '/etc/hosts' hosts = [] for line in open(filename, 'r'): match = re.search('\d+\.\d+\.\d+\.\d+\s+(\S+)', line) if match is None or re.search('^(?:float|localhost)\.', line): continue hostname = match.group(1) count = 0 for arg in sys.argv[1:]: for section in hostname.split('.'): if section == arg: count = count + 1 break if count == len(sys.argv) - 1: hosts.append(hostname) if len(hosts) == 1: os.system("ssh -A " + hosts[0]) else: print '\n'.join(hosts) -- http://mail.python.org/mailman/listinfo/python-list