Re: split an iteration
Robin Becker wrote: eg for e = enumerate([0,1,2,3,4,5]) for i,a in e: ... if a==3: break ... for i,a in e: ... print i,a ... 4 4 5 5 I think the second loop needs to start at 3 ie the split needs to be start, limit semantics It would be nice to be able to fix it with a move back method. I have to reread your previous post, it seems. Meanwhile: e = enumerate(range(6)) for i, a in e: ... if a == 3: ... for i, a in itertools.chain([(i, a)], e): ... print i, a ... break ... 3 3 4 4 5 5 Nesting the loops is necessary to handle empty lists and lists with no matching item correctly. Alternatively, you could set a 'found' flag. Another option: def predicate((i, a)): return a != 3 ... for i, a in itertools.dropwhile(predicate, enumerate(range(6))): ... print i, a ... 3 3 4 4 5 5 The extra function call might have a negative impact on performance, though. Peter -- http://mail.python.org/mailman/listinfo/python-list
split an iteration
This function from texlib in oedipus.sf.net is a real cpu hog and I determined to see if it could be optimized. def add_active_node(self, active_nodes, node): Add a node to the active node list. The node is added so that the list of active nodes is always sorted by line number, and so that the set of (position, line, fitness_class) tuples has no repeated values. index = 0 # Find the first index at which the active node's line number # is equal to or greater than the line for 'node'. This gives # us the insertion point. while (index len(active_nodes) and active_nodes[index].line node.line): index = index + 1 insert_index = index # Check if there's a node with the same line number and # position and fitness. This lets us ensure that the list of # active nodes always has unique (line, position, fitness) # values. while (index len(active_nodes) and active_nodes[index].line == node.line): if (active_nodes[index].fitness_class == node.fitness_class and active_nodes[index].position == node.position): # A match, so just return without adding the node return index = index + 1 active_nodes.insert(insert_index, node) I determined that len was being called a large number of times so did a first rewrite as (minus comments) def add_active_node(self, active_nodes, node): index = 0 nan = len(active_nodes) node_line = node.line node_fitness_class = node.fitness_class node_position = node.position while index nan and active_nodes[index].linenode_line: index += 1 insert_index = index while indexnan and active_nodes[index].line==node_line: if (active_nodes[index].fitness_class==node_fitness_class and active_nodes[index].position==node_position): return index += 1 active_nodes.insert(insert_index, node) which gives a good speedup even so I decided to eliminate the index += 1 in the first while loop which results in def add_active_node(self, active_nodes, node): nan = len(active_nodes) node_line = node.line node_fitness_class = node.fitness_class node_position = node.position insert_index = nan for index, a in enumerate(active_nodes): if a.line=node_line: insert_index = index break index = insert_index while indexnan and active_nodes[index].line==node_line: if (active_nodes[index].fitness_class==node_fitness_class and active_nodes[index].position==node_position): return index += 1 active_nodes.insert(insert_index, node) and this change also results in a significant speedup and is I believe is still equivalent. I'm not sure exactly why this is faster than the while loop, but it is. However, it seems harder to get the same speedup for the last while loop; that loop is probably not such a problem so it's not terribly important. Is there a fast way to get enumerate to operate over a slice of an iterable? -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list
Re: split an iteration
Robin Becker wrote: Is there a fast way to get enumerate to operate over a slice of an iterable? I think you don't need that here: e = enumerate(active_nodes) for insert_index, a in e: # ... for index, a in e: # ... Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: split an iteration
[Robin Becker] This function from texlib in oedipus.sf.net is a real cpu hog and I determined to see if it could be optimized. def add_active_node(self, active_nodes, node): Add a node to the active node list. The node is added so that the list of active nodes is always sorted by line number, and so that the set of (position, line, fitness_class) tuples has no repeated values. If you can change the data structure to be an actual list of tuples, then the bisect module can be used directly: insert_index = bisect.bisect_left(active_nodes, node) if active_nodes[insert_index] == node: return# avoid creating a duplicate entry active_nodes.insert(insert_index, node) If the data structure cannot be changed to tuples, then try adding a custom compare operation to the node class: def __cmp__(self, other): return cmp((self.line, self.position, self.fitness_class), (other.line, other.position, other.fitness_class)) insert_index = nan for index, a in enumerate(active_nodes): if a.line=node_line: insert_index = index break index = insert_index This loop can be squeezed a bit more using itertools.imap() and operator.attrgetter() for the attribute lookup: for index, aline in enumerate(imap(attrgetter('line'), active_nodes): if aline node_line: . . . Is there a fast way to get enumerate to operate over a slice of an iterable? enumerate(s) is the same as izip(count(), s). So, to start from position i, write: for index, value in izip(count(i), s[i:]): . . . That being said, your best bet is to eliminate the initial linear search which is likely consuming most of the clock cycles. Also, I noticed that the code does not reference self. Accordingly, it is a good candidate for being a staticmethod or standalone function. Raymond Hettinger -- http://mail.python.org/mailman/listinfo/python-list
Re: split an iteration
Peter Otten wrote: Robin Becker wrote: Is there a fast way to get enumerate to operate over a slice of an iterable? I think you don't need that here: e = enumerate(active_nodes) for insert_index, a in e: # ... for index, a in e: # ... Peter I tried your solution, but I think we miss the split element eg for e = enumerate([0,1,2,3,4,5]) for i,a in e: ... if a==3: break ... for i,a in e: ... print i,a ... 4 4 5 5 I think the second loop needs to start at 3 ie the split needs to be start, limit semantics It would be nice to be able to fix it with a move back method. -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list
Re: split an iteration
Raymond Hettinger wrote: [Robin Becker] This function from texlib in oedipus.sf.net is a real cpu hog and I determined to see if it could be optimized. def add_active_node(self, active_nodes, node): Add a node to the active node list. The node is added so that the list of active nodes is always sorted by line number, and so that the set of (position, line, fitness_class) tuples has no repeated values. If you can change the data structure to be an actual list of tuples, then the bisect module can be used directly: This is a way forward and I think is doable. The original Knuth algo used only global integer arrays. The actual insert point depends on the line attribute only which would be the first tuple element. So apparently we always insert non-identical line breaks at the beginning of their line group; I think we can do the insert check and then a bit of checking to find the actual insert point using bisect_right handwave handwave. insert_index = bisect.bisect_left(active_nodes, node) if active_nodes[insert_index] == node: return# avoid creating a duplicate entry active_nodes.insert(insert_index, node) If the data structure cannot be changed to tuples, then try adding a custom compare operation to the node class: def __cmp__(self, other): return cmp((self.line, self.position, self.fitness_class), (other.line, other.position, other.fitness_class)) insert_index = nan for index, a in enumerate(active_nodes): if a.line=node_line: insert_index = index break index = insert_index This loop can be squeezed a bit more using itertools.imap() and operator.attrgetter() for the attribute lookup: for index, aline in enumerate(imap(attrgetter('line'), active_nodes): if aline node_line: . . . Is there a fast way to get enumerate to operate over a slice of an iterable? enumerate(s) is the same as izip(count(), s). So, to start from position i, write: for index, value in izip(count(i), s[i:]): . . . That being said, your best bet is to eliminate the initial linear search which is likely consuming most of the clock cycles. Also, I noticed that the code does not reference self. Accordingly, it is a good candidate for being a staticmethod or standalone function. Raymond Hettinger -- Robin Becker -- http://mail.python.org/mailman/listinfo/python-list