On Jul 19, 1:43 pm, Paul Rubin <no.em...@nospam.invalid> wrote: > "larry.mart...@gmail.com" <larry.mart...@gmail.com> writes: > > Thanks for the reply Paul. I had not heard of itertools. It sounds > > like just what I need for this. But I am having 1 issue - how do you > > know how many items are in each group? > > Simplest is: > > for key, group in groupby(xs, lambda x:(x[-1],x[4],x[5])): > gs = list(group) # convert iterator to a list > n = len(gs) # this is the number of elements > > there is some theoretical inelegance in that it requires each group to > fit in memory, but you weren't really going to have billions of files > with the same basename. > > If you're not used to iterators and itertools, note there are some > subtleties to using groupby to iterate over files, because an iterator > actually has state. It bumps a pointer and maybe consumes some input > every time you advance it. In a situation like the above, you've got > some nexted iterators (the groupby iterator generating groups, and the > individual group iterators that come out of the groupby) that wrap the > same file handle, so bad confusion can result if you advance both > iterators without being careful (one can consume file input that you > thought would go to another).
It seems that if you do a list(group) you have consumed the list. This screwed me up for a while, and seems very counter-intuitive. > This isn't as bad as it sounds once you get used to it, but it can be > a source of frustration at first. > > BTW, if you just want to count the elements of an iterator (while > consuming it), > > n = sum(1 for x in xs) > > counts the elements of xs without having to expand it into an in-memory > list. > > Itertools really makes Python feel a lot more expressive and clean, > despite little kinks like the above. -- http://mail.python.org/mailman/listinfo/python-list