"Raymond Hettinger" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Proposal > -------- > I am gathering data to evaluate a request for an alternate version of > itertools.izip() with a None fill-in feature like that for the built-in > map() function: > > >>> map(None, 'abc', '12345') # demonstrate map's None fill-in feature > [('a', '1'), ('b', '2'), ('c', '3'), (None, '4'), (None, '5')] > > The motivation is to provide a means for looping over all data elements > when the input lengths are unequal. The question of the day is whether > that is both a common need and a good approach to real-world problems. > The answer can likely be found in results from other programming > languages and from surveying real-world Python code. > > Other languages > --------------- > I scanned the docs for Haskell, SML, and Perl6's yen operator and found > that the norm for map() and zip() is to truncate to the shortest input > or raise an exception for unequal input lengths. Ruby takes the > opposite approach and fills-in nil values -- the reasoning behind the > design choice is somewhat inscrutable: > http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-dev/18651
>From what I can make out (with help of internet language translation sites) the relevent part (section [2]) of this presents three options for handling unequal length arguments: 1. zip to longest (Perl6 does it this way) 2. zip to shortest (Python does it this way) 3. use zip method and choose depending on whether argument list is shorter or longer than object's list. It then solicits opinions on the best way. It does not state or justify any particular choice. If "perl6"=="perl6 yen operator" then there is a contradiction with your earlier statement. > Real-world code > --------------- > I scanned the standard library, my own code, and a few third-party > tools. I > found no instances where map's fill-in feature was used. > > History of zip() > ---------------- > PEP 201 (lock-step iteration) documents that a fill-in feature was > contemplated and rejected for the zip() built-in introduced in Py2.0. > In the years before and after, SourceForge logs show no requests for a > fill-in feature. My perception is that many people view the process of advocating for a library addition as 1. Very time consuming due to the large amount of work involved in presenting and defending a proposal. 2. Having a very small chance of acceptance. I do not know whether this is really the case or even if my perception is correct, but if it is, it could account for the lack of feature requests. > Request for more information > ---------------------------- > My request for readers of comp.lang.python is to search your own code > to see if map's None fill-in feature was ever used in real-world code > (not toy examples). I'm curious about the context, how it was used, > and what alternatives were rejected (i.e. did the fill-in feature > improve the code). Likewise, I'm curious as to whether anyone has seen > a zip-style fill-in feature employed to good effect in some other > programming language. How well correlated in the use of map()-with-fill with the (need for) the use of zip/izip-with-fill? > Parallel to SQL? > ---------------- > If an iterator element's ordinal position were considered as a record > key, then the proposal equates to a database-style full outer join > operation (one which includes unmatched keys in the result) where record > order is significant. Does an outer-join have anything to do with > lock-step iteration? Is this a fundamental looping construct or just a > theoretical wish-list item? Does Python need itertools.izip_longest() > or would it just become a distracting piece of cruft? > > Raymond Hettinger > > FWIW, the OP's use case involved printing files in multiple > columns: > > for f, g in itertools.izip_longest(file1, file2, fillin_value=''): > print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip()) > > The alternative was straightforward but less terse: > > while 1: > f = file1.readline() > g = file2.readline() > if not f and not g: > break > print '%-20s\t|\t%-20s' % (f.rstrip(), g.rstrip()) Actuall my use case did not have quite so much perlish line noise :-) Compared to for f, g in izip2 (file1, file2, fill=''): print '%s\t%s' % (f, g) the above looks like a relatively minor loss of conciseness, but consider the uses of the current izip, for example for i1, i2 in itertools.izip (iterable_1, iterable_2): print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip()) can be replaced by: while 1: i1 = iterable_1.next() i2 = iterable_2.next() print '%-20s\t|\t%-20s' % (i1.rstrip(), i2.rstrip()) yet that was not justification for rejecting izip()'s inclusion in itertools. The other use case I had was a simple file diff. All I cared about was if the files were the same or not, and if not, what were the first differing lines. This was to compare output from a process that was supposed to match some saved reference data. Because of error propagation, lines beyond the first difference were meaningless. The code, using an "iterate to longest with fill" izip would be roughly: # Simple file diff to ident for ln1, ln2 in izip_long (file1, file2, fill="<EOF>"): if ln1 != ln2: break if ln1 == ln2: print "files are identical" else: print "files are different" This same use case occured again very recently when writing unit tests to compare output of a parser with known correct output during refactoring. With file iterators one can imagine many potential use cases for izip but not imap, but there are probably few real uses existant because generaly files may be of different lengths, and there currently is no useable izip for this case. [jan09 08:30 utc] -- http://mail.python.org/mailman/listinfo/python-list