On 12/01/06, Tim Williams (gmail) <[EMAIL PROTECTED]> wrote:
I forgot to add this one
for num in (number for number in postlist if not number in barrlist):
outfile.write(number)
On 12 Jan 2006 09:04:21 -0800, fynali < [EMAIL PROTECTED]> wrote:Hi all,
I have two files:
- PSP0000320.dat (quite a large list of mobile numbers),
- CBR0000319.dat (a subset of the above, a list of barred bumbers)
# head PSP0000320.dat CBR0000319.dat
==> PSP0000320.dat <==
96653696338
96653766996
96654609431
96654722608
96654738074
96655697044
96655824738
96656190117
96656256762
96656263751
==> CBR0000319.dat <==
96651131135
96651131135
96651420412
96651730095
96652399117
96652399142
96652399142
96652399142
96652399160
96652399271
Objective: to remove the numbers present in barred-list from the
PSPfile.
$ ls -lh PSP0000320.dat CBR0000319..dat
... 56M Dec 28 19:41 PSP0000320.dat
... 8.6M Dec 28 19:40 CBR0000319.dat
$ wc -l PSP0000320.dat CBR0000319.dat
4,462,603 PSP0000320.dat
693,585 CBR0000319.dat
I wrote the following in python to do it:
#: c01:rmcommon.py
barredlist = open(r'/home/sjd/python/wip/CBR0000319.dat', 'r')
postlist = open(r'/home/sjd/python/wip/PSP0000320.dat', 'r')
outfile = open(r'/home/sjd/python/wip/PSP-CBR.dat', 'w')
# reading it all in one go, so as to avoid frequent disk accesses
(assume machine has plenty memory)
barredlist.read()
postlist.read()
#
for number in postlist:
if number in barrlist:
pass
else:
outfile.write(number)
barredlist.close(); postlist.close(); outfile.close()
#:~
The above code simply takes too long to complete. If I were to do a
diff -y PSP0000320.dat CBR0000319.dat, catch the '<' & clean it up with
sed -e 's/\([0-9]*\) *</\1/' > PSP-CBR.dat it takes <4 minutes to
complete.
It should be quicker to do this
#
for number in postlist:
if not number in barrlist:
outfile.write(number)
and quicker doing this
#
numbers = [number for number in postlist if not number in barrlist]
c
I forgot to add this one
for num in (number for number in postlist if not number in barrlist):
outfile.write(number)
-- http://mail.python.org/mailman/listinfo/python-list