Keith Goodman wrote: > I have a very long list that contains many repeated elements. The > elements of the list can be either all numbers, or all strings, or all > dates [datetime.date]. > > I want to convert the list into a matrix where each unique element of > the list is assigned a consecutive integer starting from zero. > If what you want is that the first unique element get's zero, the second one, I don't think the code below will work in general since the dict does not preserve order. You might want to look at the results for the character case to see what I mean. If you're looking for something else, you'll need to elaborate a bit. Since list2index doesn't return anything, it's not entirely clear what the answer consists of. Just idx? Idx plus uL?
> I've done it by brute force below. Any tips for making it faster? (5x > would make it useful; 10x would be a dream.) > Assuming I understand what you're trying to do, this might help: def list2index2(L): idx = ones([len(L)]) map = {} for i, x in enumerate(L): index = map.get(x) if index is None: map[x] = index = len(map) idx[i] = index return idx It's almost 10x faster for numbers and about 40x faster for characters and dates. However it produces different results from list2index in the second two cases. That may or may not be a good thing depending on what you're really trying to do. -tim > >>> list2index.test() >>> > Numbers: 5.84955787659 seconds > Characters: 24.3192870617 seconds > Dates: 39.288228035 seconds > > > import datetime, time > from numpy import nan, asmatrix, ones > > def list2index(L): > > # Find unique elements in list > uL = dict.fromkeys(L).keys() > > # Convert list to matrix > L = asmatrix(L).T > > # Initialize return matrix > idx = nan * ones((L.size, 1)) > > # Assign numbers to unique L values > for i, uLi in enumerate(uL): > idx[L == uLi,:] = i > > def test(): > > L = 5000*range(255) > t1 = time.time() > idx = list2index(L) > t2 = time.time() > print 'Numbers:', t2-t1, 'seconds' > > L = 5000*[chr(z) for z in range(255)] > t1 = time.time() > idx = list2index(L) > t2 = time.time() > print 'Characters:', t2-t1, 'seconds' > > d = datetime.date > step = datetime.timedelta > L = 5000*[d(2006,1,1)+step(z) for z in range(255)] > t1 = time.time() > idx = list2index(L) > t2 = time.time() > print 'Dates:', t2-t1, 'seconds' > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion