[EMAIL PROTECTED] wrote:
X-No-Archive: yes
Can some one help me improve this block of code...this jus converts the
list of data into tokens based on the range it falls into...but it
takes a long time.Can someone tell me what can i change to improve
it...
if data[i] in xrange(rngs[j],rngs[j+1]):
That's a bummer: You create a list and then search linearily in in -
where all you want to know is
if rngs[j] <= data[i] < rngs[j+1]
Attached is a script that does contain your old and my enhanced version
- and shows that the results are equal. Running only your version takes
~35s, where mine uses ~1s!!!
Another optimization im too lazy now would be to do sort of a "tree
search" of data[i] in rngs - as the ranges are ordered, you could find
the proper one in log_2(len(rngs)) instead of len(rngs)/2.
Additional improvements can be achieved when data is sorted - but that
depends on your application and actually sizes of data.
Diez
from math import *
from Numeric import *
from random import *
def Tkz2(tk,data):
no_of_bins = 10
tkns = []
dmax = max(data)+1
dmin = min(data)
rng = ceil(abs((dmax - dmin)/(no_of_bins*1.0)))
rngs = zeros(no_of_bins+1)
for i in xrange(no_of_bins+1):
rngs[i] = dmin + (rng*i)
for i in xrange(len(data)):
for j in xrange(len(rngs)-1):
if rngs[j] <= data[i] < rngs[j+1]:
tkns.append( str(tk)+str(j) )
return tkns
def Tkz(tk,data):
no_of_bins = 10
tkns = []
dmax = max(data)+1
dmin = min(data)
rng = ceil(abs((dmax - dmin)/(no_of_bins*1.0)))
rngs = zeros(no_of_bins+1)
for i in xrange(no_of_bins+1):
rngs[i] = dmin + (rng*i)
for i in xrange(len(data)):
for j in xrange(len(rngs)-1):
if data[i] in xrange(rngs[j], rngs[j+1]):
tkns.append( str(tk)+str(j) )
return tkns
data = range(20,12312)
shuffle(data)
res1 = Tkz('A', data)
res2 = Tkz2('A', data)
print res1 == res2
--
http://mail.python.org/mailman/listinfo/python-list