On Jun 12, 10:10 pm, "John Salerno" <[EMAIL PROTECTED]> wrote: > "Phillip B Oldham" <[EMAIL PROTECTED]> wrote in messagenews:[EMAIL PROTECTED] > > > I'd like the community's thoughts/comments on what I've done; > > improvements I can make, "don'ts" I should be avoiding, etc. I'm not > > so much bothered about the resulting data - for the moment it meets my > > needs. But any comment is welcome! > > I'm not expert, but here are a few thoughts. I hope they help. > > > #!/usr/bin/env python > > ## Open a file containing a list of domains (1 per line), > > ## request and parse it's whois record and push to a csv > > ## file. > > You might want to look into doc strings as a method of providing longer > documentation like this about what your program does. > > > dest = open('./whois.csv', 'w'); > > Semicolon!!!! :) > > > def trim( txt ): > > x = [] > > for line in txt.split("\n"): > > if line.strip() == "": > > continue > > if line.strip().startswith('WHOIS'): > > continue > > if line.strip().startswith('>>>'): > > continue > > if line.strip().startswith('%'): > > continue > > if line.startswith("--"): > > return ''.join(x) > > Is all this properly indented? One thing you can do is put each of these on > one line, since they are fairly simple: > > if line.strip().startswith('WHOIS'): continue > > although I still like proper indentation. But you have a lot of them so it > might save a good amount of space to do it this way. > > Also, just my personal preference, I like to be consistent with the type of > quotes I use for strings. Here, you mix both single and double quotes on > different lines. > > > return "\n".join(x); > > Semicolon!!!! :) :) > > > details = ['','','','','','','','',''] > > I don't have Python available to me right now, but I think you can do this > instead: > > details = [''] * 9
Be careful with this, as python's string is immutable, this is ok, but if you're replicating a mutable item here, the result would be nasty. > > > except: > > continue > > Non-specific except clauses usually aren't preferred since they catch > everything, even something you might not want to catch. > > > if domain == '': > > continue > > You can say: > > if not domain > > instead of that equivalence test. But what does this if statement do? > > > if rec.startswith("No whois server") == True: > > continue > > > if rec.startswith("This TLD has no whois server") == True: > > continue > > Like above, you don't need "== True" here. > > > if domain.endswith(".net"): > > rec = clean_net(rec) > > > if domain.endswith(".com"): > > rec = clean_net(rec) > > > if domain.endswith(".tv"): > > rec = clean_net(rec) > > > if domain.endswith(".co.uk"): > > rec = clean_co_uk(rec) > > > if domain.endswith(".info"): > > rec = clean_info(rec) > > Hmm, my first thought is to do something like this with all these if tests: > > for extension in [<list all the extensions as strings here>]: > rec = clean_net(extension) > > But for that to work, you may need to generalize the clean_net function so > it works for all of them, instead of having to call different functions > depending on the extension. > > Anyway, I hope some of that helps! -- http://mail.python.org/mailman/listinfo/python-list