I have a very long list that contains many repeated elements. The
elements of the list can be either all numbers, or all strings, or all
dates [datetime.date].

I want to convert the list into a matrix where each unique element of
the list is assigned a consecutive integer starting from zero.

I've done it by brute force below. Any tips for making it faster? (5x
would make it useful; 10x would be a dream.)

>> list2index.test()
Numbers: 5.84955787659 seconds
Characters: 24.3192870617 seconds
Dates: 39.288228035 seconds


import datetime, time
from numpy import nan, asmatrix, ones

def list2index(L):

  # Find unique elements in list
  uL = dict.fromkeys(L).keys()

  # Convert list to matrix
  L = asmatrix(L).T

  # Initialize return matrix
  idx = nan * ones((L.size, 1))

  # Assign numbers to unique L values
  for i, uLi in enumerate(uL):
    idx[L == uLi,:] = i

def test():

    L = 5000*range(255)
    t1 = time.time()
    idx = list2index(L)
    t2 = time.time()
    print 'Numbers:', t2-t1, 'seconds'

    L = 5000*[chr(z) for z in range(255)]
    t1 = time.time()
    idx = list2index(L)
    t2 = time.time()
    print 'Characters:', t2-t1, 'seconds'

    d = datetime.date
    step = datetime.timedelta
    L = 5000*[d(2006,1,1)+step(z) for z in range(255)]
    t1 = time.time()
    idx = list2index(L)
    t2 = time.time()
    print 'Dates:', t2-t1, 'seconds'

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Reply via email to