Re: [Tutor] __iter__ loops, partitioning list among children
I finally got my iterator-based version working, only to discover that it's nearly four times slower than the brute-force multiple-loops version I started with! Then I tried just adding an incrementing index to the loop, so that each loop only ran through self.events[last_index:], but that was still twice as slow as without the index. I suppose it's the overhead of incrementing the variable, or maybe some optimization in Python's internals, but the take home lesson was definitely 'leave well enough alone'. Anyway, thanks again for the advice, it's been a learning experience... E On Aug 27, 2008, at 2:22 AM, Kent Johnson wrote: On Tue, Aug 26, 2008 at 1:24 PM, Eric Abrahamsen [EMAIL PROTECTED] wrote: On Aug 26, 2008, at 7:20 PM, Kent Johnson wrote: If all you want to do with the nested Month, etc is to iterate the events in them, you could probably use a shared iterator. It would have to be able to push-back items so that when you hit the out-of-range item you could push it back on the iterator. Is a 'shared iterator' something special, or do you mean all the instances would draw their events from a single iterator? I'm not sure what this would look like. It's nothing special, I just mean that all instances would share an iterator. You would pass the iterator to the iteration function. Just for the sake of argument, here's the principle I'm working from: # lst = range(10) iterlst = iter(lst) iterlst.next() 0 for x in iterlst: ... if x 5: ... print x ... else: ... break ... 1 2 3 4 for x in iterlst: ... print x ... 6 7 8 9 # So that's why I'm creating the iterator outside of the while loop in the original code, and then using a repeated for loop with a break to step through all the events only once. Of course, the fact that 5 isn't in there probably points to the source of my problems! The solution might be assigning iterlist.next() to a variable, and advancing the variable. The problem is that the first loop consumes the 5 from the iterator but doesn't actually process it. That is why you need an iterator with push-back - so you can put the 5 back in the iterator and get it out again in the next loop. http://code.activestate.com/recipes/502304/ Though working with indices directly might be simpler. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
Just for the sake of argument, here's the principle I'm working from: # lst = range(10) iterlst = iter(lst) iterlst.next() 0 for x in iterlst: ... if x 5: ... print x ... else: ... break ... 1 2 3 4 for x in iterlst: ... print x ... 6 7 8 9 # If that contrived case is the case, you could change the code a bit to make 5 appears: for x in iterlst: print x if x = 5: break for x in iterlst: print x So that's why I'm creating the iterator outside of the while loop in the original code, and then using a repeated for loop with a break to step through all the events only once. Of course, the fact that 5 isn't in there probably points to the source of my problems! The solution might be assigning iterlist.next() to a variable, and advancing the variable. The problem is that the first loop consumes the 5 from the iterator but doesn't actually process it. That is why you need an iterator with push-back - so you can put the 5 back in the iterator and get it out again in the next loop. http://code.activestate.com/recipes/502304/ Though working with indices directly might be simpler. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
On Tue, Aug 26, 2008 at 1:36 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: So my test case: a Month has a 'child' attribute pointing at Week, which has a 'child' attribute pointing at Day, so they all know what kind of child instances iteration should produce. With nested loops, a Month produces one Week, that Week produces seven Days, then the next Week is produced, it makes seven more Days, etc. That much is easy. If all you want to do with the nested Month, etc is to iterate the events in them, you could probably use a shared iterator. It would have to be able to push-back items so that when you hit the out-of-range item you could push it back on the iterator. Then there's self.events. My original code looped over all of self.events for each child produced. A Month loops over its events four times, a Week seven times. This was the straightforward implementation, but it seemed inefficient. (I also, as you point out, might have been wrong about the way django QuerySets work). My thought was that only one loop over self.events should be necessary, in theory, since they're sorted by date. Instead of direct use of the list iterator, you could pass the list and the starting index around. Then you don't have to keep finding your place. A for loop creates an iterator from a sequence and calls next() on it, and it creates an entirely new iterator each time you start a new for loop: I still don't understand your code but you may have another misconception. Calling iter() on an iterator returns the same iterator; it does not make a new iterator that holds the same place. You can use itertools.tee() to split an iterator if that is what you want to do. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
On Aug 26, 2008, at 7:20 PM, Kent Johnson wrote: On Tue, Aug 26, 2008 at 1:36 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: So my test case: a Month has a 'child' attribute pointing at Week, which has a 'child' attribute pointing at Day, so they all know what kind of child instances iteration should produce. With nested loops, a Month produces one Week, that Week produces seven Days, then the next Week is produced, it makes seven more Days, etc. That much is easy. If all you want to do with the nested Month, etc is to iterate the events in them, you could probably use a shared iterator. It would have to be able to push-back items so that when you hit the out-of-range item you could push it back on the iterator. Is a 'shared iterator' something special, or do you mean all the instances would draw their events from a single iterator? I'm not sure what this would look like. Then there's self.events. My original code looped over all of self.events for each child produced. A Month loops over its events four times, a Week seven times. This was the straightforward implementation, but it seemed inefficient. (I also, as you point out, might have been wrong about the way django QuerySets work). My thought was that only one loop over self.events should be necessary, in theory, since they're sorted by date. Instead of direct use of the list iterator, you could pass the list and the starting index around. Then you don't have to keep finding your place. That's a definite possibility, I'll try it out. Itertools.tee is also something I've yet to look at closely. Thanks for these hints. A for loop creates an iterator from a sequence and calls next() on it, and it creates an entirely new iterator each time you start a new for loop: I still don't understand your code but you may have another misconception. Calling iter() on an iterator returns the same iterator; it does not make a new iterator that holds the same place. You can use itertools.tee() to split an iterator if that is what you want to do. Just for the sake of argument, here's the principle I'm working from: # lst = range(10) iterlst = iter(lst) iterlst.next() 0 for x in iterlst: ... if x 5: ... print x ... else: ... break ... 1 2 3 4 for x in iterlst: ... print x ... 6 7 8 9 # So that's why I'm creating the iterator outside of the while loop in the original code, and then using a repeated for loop with a break to step through all the events only once. Of course, the fact that 5 isn't in there probably points to the source of my problems! The solution might be assigning iterlist.next() to a variable, and advancing the variable. Thanks, Eric ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
Eric Abrahamsen [EMAIL PROTECTED] wrote So that's why I'm creating the iterator outside of the while loop in the original code, and then using a repeated for loop with a break to step through all the events only once. Of course, the fact that 5 isn't in there probably points to the source of my problems! The solution might be assigning iterlist.next() to a variable, and advancing the variable. In that case wouldn't it be simpler to just use the index and manually control its value? Iterators are wonderful things but when I have to start bending things too far from vanilla I tend to go back to first principles. -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
On Tue, Aug 26, 2008 at 1:24 PM, Eric Abrahamsen [EMAIL PROTECTED] wrote: On Aug 26, 2008, at 7:20 PM, Kent Johnson wrote: If all you want to do with the nested Month, etc is to iterate the events in them, you could probably use a shared iterator. It would have to be able to push-back items so that when you hit the out-of-range item you could push it back on the iterator. Is a 'shared iterator' something special, or do you mean all the instances would draw their events from a single iterator? I'm not sure what this would look like. It's nothing special, I just mean that all instances would share an iterator. You would pass the iterator to the iteration function. Just for the sake of argument, here's the principle I'm working from: # lst = range(10) iterlst = iter(lst) iterlst.next() 0 for x in iterlst: ... if x 5: ... print x ... else: ... break ... 1 2 3 4 for x in iterlst: ... print x ... 6 7 8 9 # So that's why I'm creating the iterator outside of the while loop in the original code, and then using a repeated for loop with a break to step through all the events only once. Of course, the fact that 5 isn't in there probably points to the source of my problems! The solution might be assigning iterlist.next() to a variable, and advancing the variable. The problem is that the first loop consumes the 5 from the iterator but doesn't actually process it. That is why you need an iterator with push-back - so you can put the 5 back in the iterator and get it out again in the next loop. http://code.activestate.com/recipes/502304/ Though working with indices directly might be simpler. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
Okay I think I'm onto something, more iterator-related stuff. If I can make self.events an iterator, then run a for loop on it, breaking out of the loop when the events' date attributes get too high. Then on the next run through, that same for loop should pick up where it left off, right? Here's my next version of the function. It doesn't quite work right: when I test it each child instance receives the correct events, but when it passes those events onto its children, they get consumed (or something) and are never actually output. To test I'm instantiating a Month m, filling it with events, then looping like so: for w in m: print w.event_count() for d in w: print d.events w.event_count() produces the right event count, but d.events is always empty. If anyone can see what's wrong with this function... ## def _iter_children(self, child, require_events=False): Iterate through an object's 'child' items. If require_events == True, only return children with events, otherwise return all children. iterevents = iter(self.events) while self.sentinel self.stop: c = child([], self.sentinel, self.start_attr, rolling=self.rolling, counts_only=self.counts_only) for e in iterevents: if getattr(e, self.start_attr) c.stop: c.events.append(e) else: break self.sentinel += c.dt_range if not require_events or c.has_events(): # if require_events == True, omit empty children. yield c On Aug 25, 2008, at 11:49 AM, Eric Abrahamsen wrote: On Aug 24, 2008, at 7:20 PM, Kent Johnson wrote: Forwarding to the list with my reply. Please use Reply All to reply to the list. Grr, sorry, I keep forgetting... On Sun, Aug 24, 2008 at 1:02 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: On Aug 23, 2008, at 11:22 PM, Kent Johnson wrote: On Sat, Aug 23, 2008 at 6:47 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: At first I thought the bisect module was the way to go, but it is too tightly tied to integer list indices, and works very awkwardly when bisecting on datetime attributes. I'm not sure what the problem is with bisect (to find the starting point). Your list of model elements has integer indices. I think you have to write a __cmp__ method for your model class that compares on the datetime attribute, then it should work. You can also create a list of (key, model) and sort/search that. http://www.mail-archive.com/[EMAIL PROTECTED]/msg189443.html The __cmp__ trick is very nice, and would do it, except I won't have access to the events model classes. I could get bisect to work by feeding it a list comprehension, but making the high and low parameters work required integer indices, which seemed like one half-hack too many... I don't understand the issue. *All* lists have integer indices. If you make a list of (key, model) pairs, the keys only need to be comparable, not integers. The main problem is that I don't have any control over the list of models, and all I've got is the name of a datetime attribute to filter by. The only way I could get bisect to work was like this: index = bisect([getattr(x, attr_name) for x in model_list], sentinel_date) But that loops over the whole list, which is what I was trying to avoid. And the only way I could provide high and low parameters to bisect is by calling event_list.index(event), which doesn't work on django querysets. I also experimented with itertools.groupby to produce groups of events, but that turned out to be far slower than simply looping over the whole event list and extracting events which test true for c.start = getattr(event, attr_name) c.stop. Ideally I could create a kind of concurrent iterator, that steps through children's blocks of time and the object's event list in tandem, rather than going over the whole events list once for every child produced. Maybe that's not possible... I'm pasting the whole function down below, just for the hell of it. Thanks again, Eric def _iter_children(self, child, require_events=False): Iterate through an object's 'child' items. If require_events == True, only return children with events, otherwise return all children. while self.sentinel self.stop: c = child([], self.sentinel, self.start_attr, rolling=self.rolling, counts_only=self.counts_only) for e in self.events: if c.start = getattr(e,self.start_attr) c.stop: c.events.append(e) self.sentinel += c.dt_range if not require_events or c.has_events(): yield c ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org
Re: [Tutor] __iter__ loops, partitioning list among children
I'm not following your code very well. I don't understand the relationship between the first loop and the iter_children() function. A couple of things that might help: - Django QuerySets can be qualified with additional tests, so you could have each of your month/week/etc classes have its own correctly qualified QuerySet. This will result in one database query for each event class, and multiple copies of the actual events. - When you start iterating a QuerySet, it fetches all the model instances into a list. I think you are trying to use iterators to prevent this fetch but Django doesnt' work that way. You might as well just use the list. - Iterators can't be restarted, I think that is why your latest iter_children() doesn't work as you expect. Can you say some more about why you are doing this? Where do all the initial constraints come from? Do you really have to be so careful to protect against a 'madman' user? Perhaps you could set a limit on the number of events you will retrieve and use a count() on the QuerySet to ensure that not too many records have been fetched. Maybe you should try a straightforward implementation and when you have something working you can worry about making it better. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
I do apologize for the large quantities of confusing description – articulating the problem here has helped me understand exactly what it is I'm after (though it hasn't improved my code!), and I've got a better grasp of the problem now than I did when I first asked. It isn't so much that I need to protect against crazy users, but that the pattern I'm making is very flexible, and could be used in vastly different ways. I want to make sure that it operates on efficient principles, so that people will get the best performance out of it no matter how they use it. So my test case: a Month has a 'child' attribute pointing at Week, which has a 'child' attribute pointing at Day, so they all know what kind of child instances iteration should produce. With nested loops, a Month produces one Week, that Week produces seven Days, then the next Week is produced, it makes seven more Days, etc. That much is easy. Then there's self.events. My original code looped over all of self.events for each child produced. A Month loops over its events four times, a Week seven times. This was the straightforward implementation, but it seemed inefficient. (I also, as you point out, might have been wrong about the way django QuerySets work). My thought was that only one loop over self.events should be necessary, in theory, since they're sorted by date. A for loop creates an iterator from a sequence and calls next() on it, and it creates an entirely new iterator each time you start a new for loop: each for loop starts from the beginning of the sequence. But if you create your own iterator from the sequence and run a for loop on it, then using break to jump out of the for loop should leave the iterator just where you left it, since it maintains state. Doing another for loop on it (the next time through the date-based while loop), should pick up where it left off. That's why the line iterevents = iter(self.events) is outside of the while loop: it is only created once, and the loops later in the function all make use of the same iterator, instead of creating a new one every time. I'm pretty sure this works in theory, because calling event_count() on the Weeks as they come out returns the correct number of events. But, for some reason, those events are not making it into the Day children. I had originally assumed that a QuerySet pulled objects out of the database in a rolling fashion – ie iterating on a Month would first draw a Week's worth of events from the database, then another Week, then two more. But if it loads them all at first access, then I might as well just call list() on the QuerySet at object instantiation, and save myself some trouble. I hope that's a little clearer. My central issue is maintaining my place in the self.events loop, and only advancing it as far as the date-based while loop advances. Whether that's possible or not... Thanks again, Eric On Aug 26, 2008, at 1:12 AM, Kent Johnson wrote: I'm not following your code very well. I don't understand the relationship between the first loop and the iter_children() function. A couple of things that might help: - Django QuerySets can be qualified with additional tests, so you could have each of your month/week/etc classes have its own correctly qualified QuerySet. This will result in one database query for each event class, and multiple copies of the actual events. - When you start iterating a QuerySet, it fetches all the model instances into a list. I think you are trying to use iterators to prevent this fetch but Django doesnt' work that way. You might as well just use the list. - Iterators can't be restarted, I think that is why your latest iter_children() doesn't work as you expect. Can you say some more about why you are doing this? Where do all the initial constraints come from? Do you really have to be so careful to protect against a 'madman' user? Perhaps you could set a limit on the number of events you will retrieve and use a count() on the QuerySet to ensure that not too many records have been fetched. Maybe you should try a straightforward implementation and when you have something working you can worry about making it better. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
On Aug 24, 2008, at 7:20 PM, Kent Johnson wrote: Forwarding to the list with my reply. Please use Reply All to reply to the list. Grr, sorry, I keep forgetting... On Sun, Aug 24, 2008 at 1:02 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: On Aug 23, 2008, at 11:22 PM, Kent Johnson wrote: On Sat, Aug 23, 2008 at 6:47 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: At first I thought the bisect module was the way to go, but it is too tightly tied to integer list indices, and works very awkwardly when bisecting on datetime attributes. I'm not sure what the problem is with bisect (to find the starting point). Your list of model elements has integer indices. I think you have to write a __cmp__ method for your model class that compares on the datetime attribute, then it should work. You can also create a list of (key, model) and sort/search that. http://www.mail-archive.com/[EMAIL PROTECTED]/msg189443.html The __cmp__ trick is very nice, and would do it, except I won't have access to the events model classes. I could get bisect to work by feeding it a list comprehension, but making the high and low parameters work required integer indices, which seemed like one half-hack too many... I don't understand the issue. *All* lists have integer indices. If you make a list of (key, model) pairs, the keys only need to be comparable, not integers. The main problem is that I don't have any control over the list of models, and all I've got is the name of a datetime attribute to filter by. The only way I could get bisect to work was like this: index = bisect([getattr(x, attr_name) for x in model_list], sentinel_date) But that loops over the whole list, which is what I was trying to avoid. And the only way I could provide high and low parameters to bisect is by calling event_list.index(event), which doesn't work on django querysets. I also experimented with itertools.groupby to produce groups of events, but that turned out to be far slower than simply looping over the whole event list and extracting events which test true for c.start = getattr(event, attr_name) c.stop. Ideally I could create a kind of concurrent iterator, that steps through children's blocks of time and the object's event list in tandem, rather than going over the whole events list once for every child produced. Maybe that's not possible... I'm pasting the whole function down below, just for the hell of it. Thanks again, Eric def _iter_children(self, child, require_events=False): Iterate through an object's 'child' items. If require_events == True, only return children with events, otherwise return all children. while self.sentinel self.stop: c = child([], self.sentinel, self.start_attr, rolling=self.rolling, counts_only=self.counts_only) for e in self.events: if c.start = getattr(e,self.start_attr) c.stop: c.events.append(e) self.sentinel += c.dt_range if not require_events or c.has_events(): yield c ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] __iter__ loops, partitioning list among children
Hi, I've got a problem that takes a bit of explaining, but it's relatively simple when you get down to it. This is another django-related thing, but the issue itself is pure python. I made a custom class, called an EventEngine, which represents a span of time. You initialize it with a queryset/list of django model instances, and the resulting EventEngine object has an 'events' attribute, which is a queryset/list of model instances that fall within that span of time, according to a datetime attribute on the django model. Next come EventEngine subclasses, representing commonly used spans of time – Month, Week, Day, etc. The EventEngine __iter__ method is overridden so that each of these subclasses, when iterated over, produces child instances of the next smallest EventEngine subclass. Iterating on a Month produces Weeks, iterating on a Week produces Days and so on. As each parent produces its children, the events in the 'events' attribute get partitioned out to the children as appropriate to their date ranges. Right now I'm doing this as stupidly as possible: for each child 'c' in the parent, I loop over all the parent events, and append them to c.events if their date attribute falls within the child's date range: for e in self.events: if c.start = getattr(e,self.start_attr) c.stop: c.events.append(e) This is stupid for (at least) two reasons: 1) it evaluates the entire queryset and sticks it in memory with the first child instance, obviating the whole point of an iterable, and 2) it loops over the entire event list for every child, even though the list is already properly sorted by date, and we could break from the loop with the first event that falls outside the child's date range. Performance is important, as some madman could initialize a Year with a huge queryset, and do nested iterations all the way down to Minute. So I've got a few considerations: 1. The parent's events queryset should only be evaluated as much as is needed to yield one child at a time. 2. The parent's events queryset should not be duplicated as it is distributed among the children, ie it should make references and not copies (I don't know how to make sure this happens). 3. I don't want to empty the parent's 'events' list as it is distributed among the children, as other methods depend on the list being there. 4. The events attribute of the 'top level' instance is a django queryset, but it will be a list for all descendent instances (they can be nested arbitrarily deeply). This limits the number of functions available – ie you can't pop() a django queryset. At first I thought the bisect module was the way to go, but it is too tightly tied to integer list indices, and works very awkwardly when bisecting on datetime attributes. List slices produce copies, unless I'm mistaken. Something tells me that a plain old while loop with a break might do the trick, but I can't quite see it clear. Ideally this would be something that starts iterating where the date attribute falls into the right range, and then stops iterating as soon as it passes out of that range. Probably I'll have to settle for just one or the other, but one can dream. Thanks for reading all this. I'm hoping that all the restrictions and conditions will make the solution more obvious, rather than less obvious, to those who know more than I... --Eric ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] __iter__ loops, partitioning list among children
On Sat, Aug 23, 2008 at 6:47 AM, Eric Abrahamsen [EMAIL PROTECTED] wrote: At first I thought the bisect module was the way to go, but it is too tightly tied to integer list indices, and works very awkwardly when bisecting on datetime attributes. I'm not sure what the problem is with bisect (to find the starting point). Your list of model elements has integer indices. I think you have to write a __cmp__ method for your model class that compares on the datetime attribute, then it should work. You can also create a list of (key, model) and sort/search that. http://www.mail-archive.com/[EMAIL PROTECTED]/msg189443.html List slices produce copies, unless I'm mistaken. List slices copy the references in the list, not the values. I.e. it is not a deep copy. So you will have only one copy of each model instance even if you slice up the list. So, unless your lists are very large or you have many levels in your hierarchy, this might work. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor