[Robin Becker] > This function from texlib in oedipus.sf.net is a real cpu hog and I determined > to see if it could be optimized. > > def add_active_node(self, active_nodes, node): > """Add a node to the active node list. > The node is added so that the list of active nodes is always > sorted by line number, and so that the set of (position, line, > fitness_class) tuples has no repeated values. > """
If you can change the data structure to be an actual list of tuples, then the bisect module can be used directly: insert_index = bisect.bisect_left(active_nodes, node) if active_nodes[insert_index] == node: return # avoid creating a duplicate entry active_nodes.insert(insert_index, node) If the data structure cannot be changed to tuples, then try adding a custom compare operation to the node class: def __cmp__(self, other): return cmp((self.line, self.position, self.fitness_class), (other.line, other.position, other.fitness_class)) > insert_index = nan > for index, a in enumerate(active_nodes): > if a.line>=node_line: > insert_index = index > break > index = insert_index This loop can be squeezed a bit more using itertools.imap() and operator.attrgetter() for the attribute lookup: for index, aline in enumerate(imap(attrgetter('line'), active_nodes): if aline > node_line: . . . > Is there a fast way to get enumerate to operate over a slice of an iterable? enumerate(s) is the same as izip(count(), s). So, to start from position i, write: for index, value in izip(count(i), s[i:]): . . . That being said, your best bet is to eliminate the initial linear search which is likely consuming most of the clock cycles. Also, I noticed that the code does not reference self. Accordingly, it is a good candidate for being a staticmethod or standalone function. Raymond Hettinger -- http://mail.python.org/mailman/listinfo/python-list