Fredrik Lundh wrote: > "fynali" wrote: > > >>>Objective: to remove the numbers present in barred-list from the >>>PSPfile. >>> >>> $ ls -lh PSP0000320.dat CBR0000319.dat >>> ... 56M Dec 28 19:41 PSP0000320.dat >>> ... 8.6M Dec 28 19:40 CBR0000319.dat >>> >>> $ wc -l PSP0000320.dat CBR0000319.dat >>> 4,462,603 PSP0000320.dat >>> 693,585 CBR0000319.dat >>> >>>I wrote the following in python to do it: >>> >>> #: c01:rmcommon.py >>> barredlist = open(r'/home/sjd/python/wip/CBR0000319.dat', 'r') >>> postlist = open(r'/home/sjd/python/wip/PSP0000320.dat', 'r') >>> outfile = open(r'/home/sjd/python/wip/PSP-CBR.dat', 'w') >>> >>> # reading it all in one go, so as to avoid frequent disk accesses >>> (assume machine has plenty memory) >>> barredlist.read() >>> postlist.read() >>> >>> # >>> for number in postlist: >>> if number in barrlist: >>> pass >>> else: >>> outfile.write(number) >>> >>> barredlist.close(); postlist.close(); outfile.close() >>> #:~ >>> >>>The above code simply takes too long to complete. > > > the above code doesn't even run. > > (why is it that nobody remembers how to use cut and paste these > days? has it perhaps been banned in some part of the world, with- > out me noticing) > > this might work a little better: > > barred = set(open('/home/sjd/python/wip/CBR0000319.dat')) > > infile = open('/home/sjd/python/wip/PSP0000320.dat') > outfile = open('/home/sjd/python/wip/PSP-CBR.dat', 'w') > > for number in infile: > if number not in barred: > outfile.write(number) > > if you feel adventurous, you can replace the for/if loop with > > outfile.writelines(number for number in infile if number not in > barred) > > ::: > > tim wrote: > > >>It should be quicker to do this >> >> # >> for number in postlist: >> if not number in barrlist: >> outfile.write(number) >> >> >>and quicker doing this >> >> # >>numbers = [number for number in postlist if not number in barrlist] >>outfile.write(''.join(numbers)) > > > looks like premature non-optimization to me... > It might be quicker to establish a dict whose keys are the barred numbers and use that, rather than a list, to determine whether the input numbers should make it through.
regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list