I have no objection to adding a zip_strict() or zip_exact() to itertools. I am used to the current behavior, and am apparently in minority in not usually assuming common length iterators. Say +0 on a new function.
But I'm definitely -1 on adding a mode switch to the built-in. This is not the way Python is usually done. zip_longest() is a clear example, but so is the recent cut_suffix (or whatever final spelling was chosen). Some folks wanted a mode switch on .rstrip(), and that was appropriately rejected. If zip_strict() is genuinely what you want to do, an import from stdlib is not much effort to get it. My belief is that usually people who think they want this actually want zip_longest(), but that's up to them. On Sat, Apr 25, 2020, 12:43 PM Christopher Barker <python...@gmail.com> wrote: > On Sat, Apr 25, 2020 at 7:43 AM Steven D'Aprano <st...@pearwood.info> > wrote: > >> I think that the "correct" (simplest, easiest, most obvious, most >> flexible) way is: >> >> with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c: >> for lineA, lineB, lineC in zip_longest(a, b, c, fillvalue=''): >> do_something_with(lineA, lineB, lineC) >> >> ... > >> Especially if the files differ in how many newlines they end with. E.g. >> file a.txt and c.txt end with a newline, but b.txt ends without one, or >> ends with an extra blank line at the end. >> >> File handling code ought to be resilient in the face of such meaningless >> differences, > > > sure. But what difference is "meaningless" depends on the use case. For > instance, comments or blank lines in the middle of a file may be a > meaningless difference. And you'd want to handle that before zipping > anyway. The way I've solved these types of issues in the past is to filter > the files first, maybe something like: > > with open('a.txt') as a, open('b.txt') as b, open('c.txt') as c: > for lineA, lineB, lineC in zip(filtered(a), > filtered(b), > filtered(c), strict=True): > do_something_with(lineA, lineB, lineC) > > > So my argument is that anything you want zip_strict for is better > >> handled with zip_longest -- including the case of just raising. >> > > That is quite the leap! You make a decent case about handling empty lines > in files, but extending that to "anything" is unwarranted. > > I honestly do not understand the resistance here. Yes, any change to the > standard library should be carefully considered, and any change IS a > disruption, and this proposed change may not be worth it. But arguing that > it wouldn't ever be useful, I jsut don't get. > > Entirely anecdotal evidence here, but I think this is born out by the > comments in this thread. > > * Many people are surprised when they first discover that zip() stops as > the shortest, and silently ignores the rest -- I know I was. > * Many uses (most?) do expect the iterators to be of equal length. > - The main exception to this may be when one of them is infinite, but > how common is that, really? Remember that when zip was first created (py2) > it was a list builder, not an iterator, and Python itself was much less > iterable-focused. > * However, many uses work fine without any length-checking -- that is > often taken car of elsewhere in the code -- this is kinda-sorta analogous > to a lack of type checking, sure you COULD get errors, but you usually > don't. > > We've done fine for years with zip's current behavior, but that doesn't > mean it couldn't be a little better and safer for a lot of use cases, and a > number of folks on this thread have said that they would use it. > > So: if this were added, it would get some use. How much? hard to know. Is > it critically important? absolute not. But it's fully backward compatible > and not a language change, the barrier to entry is not all that high. > > However, I agree with (I think Brandt) in that the lack of a critical need > means that a zip_strict() in itertools would get a LOT less use than a flag > on zip itself -- so I advocate for that. If folks think extending zip() is > not worth it, then I don't think it would be worth bothering with adding a > sip_strict to itertools at all. > > -CHB > > -- > Christopher Barker, PhD > > Python Language Consulting > - Teaching > - Scientific Software Development > - Desktop GUI and Web Development > - wxPython, numpy, scipy, Cython > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/2X74JUYM3OF5LGEIWRMS4HTWPTKHX53D/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/DZZ7I3PEVYB2LW6Y2ECSQRUUTYEB3D7O/ Code of Conduct: http://python.org/psf/codeofconduct/