I have a very long list that contains many repeated elements. The elements of the list can be either all numbers, or all strings, or all dates [datetime.date].
I want to convert the list into a matrix where each unique element of the list is assigned a consecutive integer starting from zero. I've done it by brute force below. Any tips for making it faster? (5x would make it useful; 10x would be a dream.) >> list2index.test() Numbers: 5.84955787659 seconds Characters: 24.3192870617 seconds Dates: 39.288228035 seconds import datetime, time from numpy import nan, asmatrix, ones def list2index(L): # Find unique elements in list uL = dict.fromkeys(L).keys() # Convert list to matrix L = asmatrix(L).T # Initialize return matrix idx = nan * ones((L.size, 1)) # Assign numbers to unique L values for i, uLi in enumerate(uL): idx[L == uLi,:] = i def test(): L = 5000*range(255) t1 = time.time() idx = list2index(L) t2 = time.time() print 'Numbers:', t2-t1, 'seconds' L = 5000*[chr(z) for z in range(255)] t1 = time.time() idx = list2index(L) t2 = time.time() print 'Characters:', t2-t1, 'seconds' d = datetime.date step = datetime.timedelta L = 5000*[d(2006,1,1)+step(z) for z in range(255)] t1 = time.time() idx = list2index(L) t2 = time.time() print 'Dates:', t2-t1, 'seconds' ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion