On Sunday 02 February 2003 12:46 am, magnet wrote:
> I have a large text file containing thousands of url's, one per line, and
> am trying to find a suitable utility that will strip out identical lines
> and leave a condensed file. Can anyone suggest a good solution?
> Thanks :)
---------------------------------------------------------------------------
#!/usr/bin/env python
import sys, os
if len(sys.argv) <= 2:
        print "Usage is './duprem infile outfile"
        sys.exit(1)
HOME=os.expanduser("~")

infile=sys.argv[1]
outfile=sys.argv[2]
def userhome(filename):
        if string.find(HOME,filename)==0:
                return filename
        else:
                return HOME+filename
infile=userhome(infile)
outfile=userhome(outfile)

Goodinput=os.system('[ -e infile ]')
if Goodinput != 0:
        print "input file "+infile+" does not exist"
        sys.exit(2)

input=open(infile,"r")
output=open(outfile,"w")

G=[]

g=input.readline()
while len(g) > 0:
        i=0
        for x in G:
                if x == g:
                        i=1
                        print "duplicate "+g+" removed"
                        break
        if i == 0:
                G.append(g)
        g=input.readline()
for x in G:
        output.write(x)
output.close
print "complete"


-----------------------------------------------------------------

Well put everything between the dashed lines into a text file called duprem in 
your user space, then chmod a+x duprem then call it by

./duprem (fileofurlswithduplicates) (outputfilecleanedofdups)

Civileme




Want to buy your Pack or Services from MandrakeSoft? 
Go to http://www.mandrakestore.com

Reply via email to