On Jun 12, 10:10 pm, "John Salerno" <[EMAIL PROTECTED]> wrote:
> "Phillip B Oldham" <[EMAIL PROTECTED]> wrote in messagenews:[EMAIL PROTECTED]
>
> > I'd like the community's thoughts/comments on what I've done;
> > improvements I can make, "don'ts" I should be avoiding, etc. I'm not
> > so much bothered about the resulting data - for the moment it meets my
> > needs. But any comment is welcome!
>
> I'm not expert, but here are a few thoughts. I hope they help.
>
> > #!/usr/bin/env python
> > ## Open a file containing a list of domains (1 per line),
> > ## request and parse it's whois record and push to a csv
> > ## file.
>
> You might want to look into doc strings as a method of providing longer
> documentation like this about what your program does.
>
> > dest = open('./whois.csv', 'w');
>
> Semicolon!!!! :)
>
> > def trim( txt ):
> > x = []
> > for line in txt.split("\n"):
> > if line.strip() == "":
> > continue
> > if line.strip().startswith('WHOIS'):
> > continue
> > if line.strip().startswith('>>>'):
> > continue
> > if line.strip().startswith('%'):
> > continue
> > if line.startswith("--"):
> > return ''.join(x)
>
> Is all this properly indented? One thing you can do is put each of these on
> one line, since they are fairly simple:
>
> if line.strip().startswith('WHOIS'): continue
>
> although I still like proper indentation. But you have a lot of them so it
> might save a good amount of space to do it this way.
>
> Also, just my personal preference, I like to be consistent with the type of
> quotes I use for strings. Here, you mix both single and double quotes on
> different lines.
>
> > return "\n".join(x);
>
> Semicolon!!!!  :) :)
>
> > details = ['','','','','','','','','']
>
> I don't have Python available to me right now, but I think you can do this
> instead:
>
> details = [''] * 9

Be careful with this, as python's string is immutable, this is ok, but
if you're replicating a mutable item here, the result would be nasty.

>
> > except:
> > continue
>
> Non-specific except clauses usually aren't preferred since they catch
> everything, even something you might not want to catch.
>
> > if domain == '':
> > continue
>
> You can say:
>
> if not domain
>
> instead of that equivalence test. But what does this if statement do?
>
> > if rec.startswith("No whois server") == True:
> > continue
>
> > if rec.startswith("This TLD has no whois server") == True:
> > continue
>
> Like above, you don't need "== True" here.
>
> > if domain.endswith(".net"):
> > rec = clean_net(rec)
>
> > if domain.endswith(".com"):
> > rec = clean_net(rec)
>
> > if domain.endswith(".tv"):
> > rec = clean_net(rec)
>
> > if domain.endswith(".co.uk"):
> > rec = clean_co_uk(rec)
>
> > if domain.endswith(".info"):
> > rec = clean_info(rec)
>
> Hmm, my first thought is to do something like this with all these if tests:
>
> for extension in [<list all the extensions as strings here>]:
>     rec = clean_net(extension)
>
> But for that to work, you may need to generalize the clean_net function so
> it works for all of them, instead of having to call different functions
> depending on the extension.
>
> Anyway, I hope some of that helps!

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to