>> Here, domain name doesn't contain subdomain, or should I >> say, domain's part of 'www', mail, news and en should be >> excluded. > > It's a little more complicated, you have to treat co.uk about > the same way as .com, and similarly for some other countries > but not all. For example, subdomain.companyname.de versus > subdomain.companyname.com.au or subdomain.companyname.co.uk. > You end up needing a table or special code to say how to treat > various countries.
In addition, you get very different results even on just "base" domain-name, such as "whitehouse" based on whether you use the ".gov" or ".com" variant of the TLD. Thus, I'm not sure there's any way to discern this example from the "yahoo.com" vs. "yahoo.co.uk" variant without doing a boatload of WHOIS queries, which in turn might be misleading anyways. A first-pass solution might look something like: ##############################################################>>> sites ['http://mail.google.com', 'http://reader.google.com', 'http://mail.yahoo.co.uk', 'http://google.com', 'http://mail.yahoo.com'] >>> sitebits = [site.lower().lstrip('http://').split('.') for site in sites] >>> for site in sitebits: site.reverse() ... >>> sorted(sitebits) [['com', 'google'], ['com', 'google', 'mail'], ['com', 'google', 'reader'], ['co m', 'yahoo', 'mail'], ['uk', 'co', 'yahoo', 'mail']] >>> results = ['http://' + ('.'.join(reversed(site))) for site in sorted(sitebits)] >>> results ['http://google.com', 'http://mail.google.com', 'http://reader.google.com', 'http://mail.yahoo.com', 'http://mail.yahoo.co.uk'] ############################################################## which can be wrapped up like this: ############################################################## >>> def sort_by_domain(sites): ... sitebits = [site.lower().lstrip('http://').split('.') for site in sites] ... for site in sitebits: site.reverse() ... return ['http://' + ('.'.join(reversed(site))) for site in sorted(sitebits)] ... >>> s = sites >>> sort_by_domain(sites) ['http://google.com', 'http://mail.google.com', 'http://reader.google.com', 'http://mail.yahoo.com', 'http://mail.yahoo.co.uk'] ############################################################## to give you a sorting function. It assumes http rather than having mixed url-types, such as ftp or mailto. They're easy enough to strip off as well, but putting them back on becomes a little more exercise. Just a few ideas, -tkc -- http://mail.python.org/mailman/listinfo/python-list