On Sunday 02 February 2003 12:46 am, magnet wrote: > I have a large text file containing thousands of url's, one per line, and > am trying to find a suitable utility that will strip out identical lines > and leave a condensed file. Can anyone suggest a good solution? > Thanks :) --------------------------------------------------------------------------- #!/usr/bin/env python import sys, os if len(sys.argv) <= 2: print "Usage is './duprem infile outfile" sys.exit(1) HOME=os.expanduser("~")
infile=sys.argv[1] outfile=sys.argv[2] def userhome(filename): if string.find(HOME,filename)==0: return filename else: return HOME+filename infile=userhome(infile) outfile=userhome(outfile) Goodinput=os.system('[ -e infile ]') if Goodinput != 0: print "input file "+infile+" does not exist" sys.exit(2) input=open(infile,"r") output=open(outfile,"w") G=[] g=input.readline() while len(g) > 0: i=0 for x in G: if x == g: i=1 print "duplicate "+g+" removed" break if i == 0: G.append(g) g=input.readline() for x in G: output.write(x) output.close print "complete" ----------------------------------------------------------------- Well put everything between the dashed lines into a text file called duprem in your user space, then chmod a+x duprem then call it by ./duprem (fileofurlswithduplicates) (outputfilecleanedofdups) Civileme
Want to buy your Pack or Services from MandrakeSoft? Go to http://www.mandrakestore.com