subject:"itertools.izip brokeness"

Re: itertools.izip brokeness

2006-01-10 Thread Antoon Pardon

Op 2006-01-10, Bengt Richter schreef <[EMAIL PROTECTED]>:
> On 9 Jan 2006 08:19:21 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
>
>>Op 2006-01-05, Bengt Richter schreef <[EMAIL PROTECTED]>:
>>> On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
> [...]
>>> But you can fix that (only test is what you see ;-) :
>>
>>Maybe, but not with this version.
>>
>>> >>> from itertools import repeat, chain, izip
>>> >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
>>> >>> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
>>> >>> for t in it: print t
>>>  ...
>>>  (3, 11)
>>>  (5, 22)
>>>  (8, 'Bye')
>>>
>>> (Feel free to generalize ;-)
>>
>>The problem with this version is that it will stop if for some reason
>>each iterable contains a 'Bye' at the same place. Now this may seem
>>far fetched at first. But consider that if data is collected from
> ISTM the job of choosing an appropriate sentinel involves making
> that not only far fetched but well-nigh impossible ;-)

>>experiments certain values may be missing. This can be indicated
>>by a special "Missing Data" value in an iterable. But this "Missing
>>Data" value would also be the prime canidate for a fill parameter
>>when an iterable is exhausted.
>>
> ISTM that confuses "missing data" with "end of data stream."

"end of data stream" implies "missing data". If I'm doing experiments
with a number of materials under a number of tempertures and I want
to compare how copper, iron and lead behaved then when I compare
the results for 400 K and there is no data for lead, I don't care
whether that is because the measurement for 400K was somehow
lost or unsuable or because they stopped the lead measurements at 350K.

It all boils down to no data for lead at 400K, there is no need that
the processing unit differentiates beteen the different reasons for
the missing data. That difference is only usefull for the loop control.

> I assumed your choice of terminating sentinel ("Bye") would not have
> that problem ;-)

That is true, but what is adequate in one situation doesn't need to
be adequate in general.

-- 
Antoon Pardon
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-09 Thread Bengt Richter

On 9 Jan 2006 08:19:21 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:

>Op 2006-01-05, Bengt Richter schreef <[EMAIL PROTECTED]>:
>> On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
[...]
>> But you can fix that (only test is what you see ;-) :
>
>Maybe, but not with this version.
>
>> >>> from itertools import repeat, chain, izip
>> >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
>> >>> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
>> >>> for t in it: print t
>>  ...
>>  (3, 11)
>>  (5, 22)
>>  (8, 'Bye')
>>
>> (Feel free to generalize ;-)
>
>The problem with this version is that it will stop if for some reason
>each iterable contains a 'Bye' at the same place. Now this may seem
>far fetched at first. But consider that if data is collected from
ISTM the job of choosing an appropriate sentinel involves making
that not only far fetched but well-nigh impossible ;-)

>experiments certain values may be missing. This can be indicated
>by a special "Missing Data" value in an iterable. But this "Missing
>Data" value would also be the prime canidate for a fill parameter
>when an iterable is exhausted.
>
ISTM that confuses "missing data" with "end of data stream."
I assumed your choice of terminating sentinel ("Bye") would not have
that problem ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-09 Thread Antoon Pardon

Op 2006-01-05, Bengt Richter schreef <[EMAIL PROTECTED]>:
> On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
>
>>On 2006-01-04, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>>><[EMAIL PROTECTED]> wrote:
 But here is my real question...
 Why isn't something like this in itertools, or why shouldn't
 it go into itertools?
>>>
>>>
>>>   4) If a need does arise, it can be met by __builtins__.map() or by
>>>  writing:  chain(iterable, repeat(None)).
>>>
>>> Yes, if youre a python guru.  I don't even understand the
>>> code presented in this thread that uses chain/repeat,
>>
>>And it wouldn't work in this case. chain(iterable, repeat(None))
>>changes your iterable into an iterator that first gives you
>>all elements in the iterator and when these are exhausted
>>will continue giving the repeat parameter. e.g.
>>
>>  chain([3,5,8],repeat("Bye")
>>
>>Will produce  3, 5 and 8 followed by an endless stream
>>of "Bye".
>>
>>But if you do this with all iterables, and you have to
>>because you don't know which one is the smaller, all
>>iterators will be infinite and izip will never stop.
>
> But you can fix that (only test is what you see ;-) :

Maybe, but not with this version.

> >>> from itertools import repeat, chain, izip
> >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
> >>> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
> >>> for t in it: print t
>  ...
>  (3, 11)
>  (5, 22)
>  (8, 'Bye')
>
> (Feel free to generalize ;-)

The problem with this version is that it will stop if for some reason
each iterable contains a 'Bye' at the same place. Now this may seem
far fetched at first. But consider that if data is collected from
experiments certain values may be missing. This can be indicated
by a special "Missing Data" value in an iterable. But this "Missing
Data" value would also be the prime canidate for a fill parameter
when an iterable is exhausted.

-- 
Antoon Pardon
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-06 Thread Paul Rubin

Steven D'Aprano <[EMAIL PROTECTED]> writes:
> > def izip5(*iterables, fill=None):
> Doesn't work: keyword arguments must be listed before * and ** arguments.

Eh, ok, gotta use **kw.

> def function(*iterators, **kwargs):
> if kwargs.keys() != ["fill"]:
> raise ValueError
> ...
> 
> It might not be the easiest API to extend, but for a special case like
> this, I think it is perfectly usable.

Yeah, that's what the earlier version had.  I tried to bypass it but
as you pointed out, it's a syntax error.  The code I posted also has a
deliberate syntax error (until Python 2.5), namely the use of the new
conditional expression syntax (PEP 308).  That could be worked around
of course.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-06 Thread Steven D'Aprano

On Thu, 05 Jan 2006 23:52:13 -0800, Paul Rubin wrote:

> [EMAIL PROTECTED] writes:
>> def izip4(*iterables, **kw):
>>  """kw:fill. An element that will pad the shorter iterable
>> kw:infinite. Number of non-terminating iterators """
> 
> That's a really kludgy API.  I'm not sure what to propose instead:
> maybe some way of distinguishing which iterables are supposed to be
> iterated til exhaustion (untested):
> 
> class Discardable(object): pass
> 
> def izip5(*iterables, fill=None):

Doesn't work: keyword arguments must be listed before * and ** arguments.

>>> def izip5(*iterables, fill=None):
  File "", line 1
def izip5(*iterables, fill=None):
 ^
SyntaxError: invalid syntax

Personally, I don't see anything wrong with an API of

function(*iterators [, fill]):
   Perform function on one or more iterators, with an optional fill
   object.

Of course, this has to be defined in code as:

def function(*iterators, **kwargs):
if kwargs.keys() != ["fill"]:
raise ValueError
...

It might not be the easiest API to extend, but for a special case like
this, I think it is perfectly usable.

-- 
Steven.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-06 Thread Bengt Richter

On 5 Jan 2006 14:34:39 -0800, [EMAIL PROTECTED] wrote:

>
>Bengt Richter wrote:
>> On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
>>
>> >On 2006-01-04, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>> >><[EMAIL PROTECTED]> wrote:
>> >>> But here is my real question...
>> >>> Why isn't something like this in itertools, or why shouldn't
>> >>> it go into itertools?
>> >>
>> >>
>> >>   4) If a need does arise, it can be met by __builtins__.map() or by
>> >>  writing:  chain(iterable, repeat(None)).
>> >>
>> >> Yes, if youre a python guru.  I don't even understand the
>> >> code presented in this thread that uses chain/repeat,
>> >
>> >And it wouldn't work in this case. chain(iterable, repeat(None))
>> >changes your iterable into an iterator that first gives you
>> >all elements in the iterator and when these are exhausted
>> >will continue giving the repeat parameter. e.g.
>> >
>> >  chain([3,5,8],repeat("Bye")
>> >
>> >Will produce  3, 5 and 8 followed by an endless stream
>> >of "Bye".
>> >
>> >But if you do this with all iterables, and you have to
>> >because you don't know which one is the smaller, all
>> >iterators will be infinite and izip will never stop.
>>
>> But you can fix that (only test is what you see ;-) :
>>
>>  >>> from itertools import repeat, chain, izip
>>  >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
>> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
>>  >>> for t in it: print t
>>  ...
>>  (3, 11)
>>  (5, 22)
>>  (8, 'Bye')
>>
>> (Feel free to generalize ;-)
>
>Which just reinforces my original point: if leaving
>out a feature is justified by the existence of some
>alternate method, then that method must be equally
>obvious as the missing feature, or must be documented
>as an idiom.  Otherwise, the justification fails.
>
>Is the above code as obvious as
>  izip([3,5,8],[11,22],sentinal='Bye')?
>(where the sentinal keyword causes izip to iterate
>to the longest argument.)
>
You are right. I was just responding with a quick fix to the
problem Antoon noted.
For a more flexible izip including the above capability, but
also abble to do the default izip with a capability of continuing iteration
in the above mode after the normal izip mode stops, see izip2.py in my other
post in this thread.

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-06 Thread Paul Rubin

Paul Rubin  writes:
># quit if only discardables are left
>dropwhile(lambda i,t: (not isinstance(i, Discardable)) and len(t)),
>   izip(t, iterables)).next()

Ehh, that should say dropwhile(lambda (t,i): ...)  to use tuple
unpacking and get the args in the right order.  I'm sleepy and forgot
what I was doing.  Of course I'm still not sure it's right.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Paul Rubin

[EMAIL PROTECTED] writes:
> def izip4(*iterables, **kw):
>  """kw:fill. An element that will pad the shorter iterable
> kw:infinite. Number of non-terminating iterators """

That's a really kludgy API.  I'm not sure what to propose instead:
maybe some way of distinguishing which iterables are supposed to be
iterated til exhaustion (untested):

class Discardable(object): pass

def izip5(*iterables, fill=None):
   """Run until all non-discardable iterators are exhausted"""
   while True:
   # exhausted iterables will put empty tuples into t
   # non-exhausted iterables will put singleton tuples there
   t = [tuple(islice(i,1)) for i in iterables]

   # quit if only discardables are left
   dropwhile(lambda i,t: (not isinstance(i, Discardable)) and len(t)),
  izip(t, iterables)).next()

   yield tuple([(v[0] if len(t) else fill) for v in t])

Then you'd wrap "infinite" and other iterators you don't need exhausted
in Discardable:

stream = izip5(a, b, Discardable(c), d, Discardable(e), fill='')

runs until a, b, and d are all exhausted.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread rurpy


"Michael Spencer" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
>
> > Bengt Richter wrote:
> ...
> >>  >>> from itertools import repeat, chain, izip
> >>  >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
> >> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
> >>  >>> for t in it: print t
> >>  ...
> >>  (3, 11)
> >>  (5, 22)
> >>  (8, 'Bye')
> >>
> >> (Feel free to generalize ;-)
> >
>
> [EMAIL PROTECTED] wrote:
> > Is the above code as obvious as
> >   izip([3,5,8],[11,22],sentinal='Bye')?
> > (where the sentinal keyword causes izip to iterate
> > to the longest argument.)
> >
>
> How about:
>
> from itertools import repeat
>
> def izip2(*iterables, **kw):
>  """kw:fill. An element that will pad the shorter iterable"""
>  fill = repeat(kw.get("fill"))
>  iterables = map(iter, iterables)
>  iters = range(len(iterables))
>
>  for i in range(10):
>  result = []
>  for idx in iters:
>  try:
>  result.append(iterables[idx].next())
>  except StopIteration:
>  iterables[idx] = fill
>  if iterables.count(fill) == len(iterables):
>  raise
>  result.append(fill.next())
>  yield tuple(result)
>
>   >>> list(izip2(range(5), range(3), range(8), range(2)))
>   [(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, None), (3, None, 3, None), (4, None, 
> 4,
> None), (None, None, 5, None), (None, None, 6, None), (None, None, 7, None)]
>   >>> list(izip2(range(5), range(3), range(8), range(2), fill="Empty"))
>   [(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 'Empty'), (3, 'Empty', 3, 'Empty'), 
> (4,
> 'Empty', 4, 'Empty'), ('Empty', 'Empty', 5, 'Empty'), ('Empty', 'Empty', 6,
> 'Empty'), ('Empty', 'Empty', 7, 'Empty')]
>   >>>

This may be getting too kludgey but by counting the
exhausted iterators you can allow for arguments
containing infinite iterators:

def izip4(*iterables, **kw):
 """kw:fill. An element that will pad the shorter iterable
kw:infinite. Number of non-terminating iterators """
 fill = repeat(kw.get("fill"))
 iterables = map(iter, iterables)
 iters = range(len(iterables))
 finite_cnt = len(iterables) - kw.get("infinite", 0)

 while True:
 result = []
 for idx in iters:
 try:
 result.append(iterables[idx].next())
 except StopIteration:
 iterables[idx] = fill
 finite_cnt -= 1
 if finite_cnt == 0:
 raise
 result.append(fill.next())
 yield tuple(result)

>>> print list(izip4(range(5), range(3), range(8), range(2), fill='empty'))
[(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 'empty'), (3, 'empty', 3,
'empty'),
(4, 'empty', 4, 'empty'), ('empty', 'empty', 5, 'empty'),
('empty', 'empty', 6, 'empty'), ('empty', 'empty', 7, 'empty')]

>>> print list(izip4(range(5), repeat('foo'), range(8), count(), infinite=2, 
>>> fill='empty'))
[(0, 'foo', 0, 0), (1, 'foo', 1, 1), (2, 'foo', 2, 2), (3, 'foo', 3,
3), (4, 'foo', 4, 4),
('empty', 'foo', 5, 5), ('empty', 'foo', 6, 6), ('empty', 'foo', 7, 7)]

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Michael Spencer

Paul Rubin wrote:
> Michael Spencer <[EMAIL PROTECTED]> writes:
>>  for i in range(10):
>>  result = []
>>  ...
> 
> Do you mean "while True: ..."?
> 
oops, yes!

so, this should have been:

from itertools import repeat

def izip2(*iterables, **kw):
 """kw:fill. An element that will pad the shorter iterable"""
 fill = repeat(kw.get("fill"))
 iterables = map(iter, iterables)
 iters = range(len(iterables))

 while True:
 result = []
 for idx in iters:
 try:
 result.append(iterables[idx].next())
 except StopIteration:
 iterables[idx] = fill
 if iterables.count(fill) == len(iterables):
 raise
 result.append(fill.next())
 yield tuple(result)

Michael

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Paul Rubin

Michael Spencer <[EMAIL PROTECTED]> writes:
>  for i in range(10):
>  result = []
>  ...

Do you mean "while True: ..."?


> def izip2(*iterables, **kw):
>  """kw:fill. An element that will pad the shorter iterable"""
>  fill = repeat(kw.get("fill"))

Yet another attempt (untested, uses Python 2.5 conditional expression):

from itertools import chain, repeat, dropwhile
def izip2(*iterables, **kw):
  fill = kw.get('fill'))
  sentinel = object()
  iterables = [chain(i, repeat(sentinel)) for i in iterables]
  while True:
t = [i.next() for i in iterables]

# raise StopIteration immediately if all iterators are now empty
dropwhile(lambda v: v is sentinel, t).next()

# map all sentinels to the fill value and yield resulting tuple
yield tuple([(v if v is not sentinel else fill) for v in t])
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Michael Spencer


> Bengt Richter wrote:
...
>>  >>> from itertools import repeat, chain, izip
>>  >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
>> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
>>  >>> for t in it: print t
>>  ...
>>  (3, 11)
>>  (5, 22)
>>  (8, 'Bye')
>>
>> (Feel free to generalize ;-)
> 

[EMAIL PROTECTED] wrote:
> Is the above code as obvious as
>   izip([3,5,8],[11,22],sentinal='Bye')?
> (where the sentinal keyword causes izip to iterate
> to the longest argument.)
> 

How about:

from itertools import repeat

def izip2(*iterables, **kw):
 """kw:fill. An element that will pad the shorter iterable"""
 fill = repeat(kw.get("fill"))
 iterables = map(iter, iterables)
 iters = range(len(iterables))

 for i in range(10):
 result = []
 for idx in iters:
 try:
 result.append(iterables[idx].next())
 except StopIteration:
 iterables[idx] = fill
 if iterables.count(fill) == len(iterables):
 raise
 result.append(fill.next())
 yield tuple(result)

  >>> list(izip2(range(5), range(3), range(8), range(2)))
  [(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, None), (3, None, 3, None), (4, None, 
4, 
None), (None, None, 5, None), (None, None, 6, None), (None, None, 7, None)]
  >>> list(izip2(range(5), range(3), range(8), range(2), fill="Empty"))
  [(0, 0, 0, 0), (1, 1, 1, 1), (2, 2, 2, 'Empty'), (3, 'Empty', 3, 'Empty'), 
(4, 
'Empty', 4, 'Empty'), ('Empty', 'Empty', 5, 'Empty'), ('Empty', 'Empty', 6, 
'Empty'), ('Empty', 'Empty', 7, 'Empty')]
  >>>

Michael


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Bengt Richter

On Thu, 05 Jan 2006 07:42:25 GMT, [EMAIL PROTECTED] (Bengt Richter) wrote:

>On 4 Jan 2006 15:20:43 -0800, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote:
>
[ ... 5 options enumerated ... ]
>>
>>
>6. Could modify izip so that one could write
>
>from itertools import izip
>zipit = izip(*seqs)# bind iterator object to preserve access to 
> its state later
>for tup in zipit:
># do something with tup as now produced
>for tup in zipit.rest(sentinel):
># tup starts with the tuple that would have been returned if all 
> sequences
># had been sampled and sentinel substituted where StopIteration 
> happened.
># continuing until but not including (sentinel,)*len(seqs)
>
>This would seem backwards compatible, and also potentially allow you to use 
>the rest mode
>from the start, as in
>
>for tup in izip(*seqs).rest(sentinel):
># process tup and notice sentinel for yourself
>
Demo-of-concept hack: only tested as you see below

< izip2.py >-
class izip2(object):
"""
works like itertools.izip except that
if a reference (e.g. it) to the stopped iterator is preserved,
it.rest(sentinel) returns an iterator that will continue
to return tuples with sentinel substituted for items from
exhausted sequences, until all sequences are exhausted.
"""
FIRST, FIRST_STOP, FIRST_REST, REST, REST_STOP = xrange(5)
def __init__(self, *seqs):
self.iters = map(iter, seqs)
self.restmode = self.FIRST
def __iter__(self): return self
def next(self):
if not self.iters: raise StopIteration
if self.restmode == self.FIRST:
tup=[]
try:
for i, it in enumerate(self.iters):
tup.append(it.next())
return tuple(tup)
except StopIteration:
self.restmode = self.FIRST_STOP # stopped, not rest-restarted
self.tup=tup;self.i=i
raise
elif self.restmode==self.FIRST_STOP:  # normal part exhausted
raise StopIteration
elif self.restmode in (self.FIRST_REST, self.REST):
if self.restmode == self.FIRST_REST:
tup = self.tup # saved
self.restmode = self.REST
else:
tup=[]
for it in self.iters:
try: tup.append(it.next())
except StopIteration: tup.append(self.sentinel)
tup = tuple(tup)
if tup==(self.sentinel,)*len(self.iters):
self.restmode = self.REST_STOP
raise StopIteration
return tuple(tup)
elif self.restmode==self.REST_STOP:  # rest part exhausted
raise StopIteration
else:
raise RuntimeError('Bad restmode: %r'%self.restmode)
def rest(self, sentinel=''):
self.sentinel = sentinel
if self.restmode==self.FIRST: # prior to any sequence end
self.restmode = self.REST
return self
self.tup.append(sentinel)
for it in self.iters[self.i+1:]:
try: self.tup.append(it.next())
except StopIteration: self.tup.append(sentinel)
self.restmode = self.FIRST_REST
return self

def test():
assert list(izip2())==[]
assert list(izip2().rest(''))==[]
it = izip2('ab', '1')
assert list(it)==[('a', '1')]
assert list(it.rest())==[('b', '')]
it = izip2('a', '12')
assert list(it)==[('a', '1')]
assert list(it.rest())==[('', '2')]
it = izip2('ab', '12')
assert list(it)==[('a', '1'), ('b', '2')]
assert list(it.rest())==[]
it = izip2(xrange(3), (11,22), 'abcd')
assert list(it) == [(0, 11, 'a'), (1, 22, 'b')]
assert list(it.rest(None)) == [(2, None, 'c'), (None, None, 'd')]
print 'test passed'

if __name__ == '__main__': test()
-

Using this, Antoon's example becomes:

 >>> from izip2 import izip2
 >>> it = izip2([3,5,8], [11,22])
 >>> for t in it: print t
 ...
 (3, 11)
 (5, 22)
 >>> for t in it.rest('Bye'): print t
 ...
 (8, 'Bye')

Want to make an efficient C version, Raymond? ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread rurpy


Bengt Richter wrote:
> On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:
>
> >On 2006-01-04, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> >><[EMAIL PROTECTED]> wrote:
> >>> But here is my real question...
> >>> Why isn't something like this in itertools, or why shouldn't
> >>> it go into itertools?
> >>
> >>
> >>   4) If a need does arise, it can be met by __builtins__.map() or by
> >>  writing:  chain(iterable, repeat(None)).
> >>
> >> Yes, if youre a python guru.  I don't even understand the
> >> code presented in this thread that uses chain/repeat,
> >
> >And it wouldn't work in this case. chain(iterable, repeat(None))
> >changes your iterable into an iterator that first gives you
> >all elements in the iterator and when these are exhausted
> >will continue giving the repeat parameter. e.g.
> >
> >  chain([3,5,8],repeat("Bye")
> >
> >Will produce  3, 5 and 8 followed by an endless stream
> >of "Bye".
> >
> >But if you do this with all iterables, and you have to
> >because you don't know which one is the smaller, all
> >iterators will be infinite and izip will never stop.
>
> But you can fix that (only test is what you see ;-) :
>
>  >>> from itertools import repeat, chain, izip
>  >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
>  >>> for t in it: print t
>  ...
>  (3, 11)
>  (5, 22)
>  (8, 'Bye')
>
> (Feel free to generalize ;-)

Which just reinforces my original point: if leaving
out a feature is justified by the existence of some
alternate method, then that method must be equally
obvious as the missing feature, or must be documented
as an idiom.  Otherwise, the justification fails.

Is the above code as obvious as
  izip([3,5,8],[11,22],sentinal='Bye')?
(where the sentinal keyword causes izip to iterate
to the longest argument.)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Bengt Richter

On 5 Jan 2006 15:48:26 GMT, Antoon Pardon <[EMAIL PROTECTED]> wrote:

>On 2006-01-04, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
>><[EMAIL PROTECTED]> wrote:
>>> But here is my real question...
>>> Why isn't something like this in itertools, or why shouldn't
>>> it go into itertools?
>>
>>
>>   4) If a need does arise, it can be met by __builtins__.map() or by
>>  writing:  chain(iterable, repeat(None)).
>>
>> Yes, if youre a python guru.  I don't even understand the
>> code presented in this thread that uses chain/repeat,
>
>And it wouldn't work in this case. chain(iterable, repeat(None))
>changes your iterable into an iterator that first gives you
>all elements in the iterator and when these are exhausted
>will continue giving the repeat parameter. e.g.
>
>  chain([3,5,8],repeat("Bye")
>
>Will produce  3, 5 and 8 followed by an endless stream
>of "Bye".
>
>But if you do this with all iterables, and you have to
>because you don't know which one is the smaller, all
>iterators will be infinite and izip will never stop.

But you can fix that (only test is what you see ;-) :

 >>> from itertools import repeat, chain, izip
 >>> it = iter(lambda z=izip(chain([3,5,8],repeat("Bye")), 
 >>> chain([11,22],repeat("Bye"))):z.next(), ("Bye","Bye"))
 >>> for t in it: print t
 ...
 (3, 11)
 (5, 22)
 (8, 'Bye')

(Feel free to generalize ;-)

Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-05 Thread Antoon Pardon

On 2006-01-04, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
><[EMAIL PROTECTED]> wrote:
>> But here is my real question...
>> Why isn't something like this in itertools, or why shouldn't
>> it go into itertools?
>
>
>   4) If a need does arise, it can be met by __builtins__.map() or by
>  writing:  chain(iterable, repeat(None)).
>
> Yes, if youre a python guru.  I don't even understand the
> code presented in this thread that uses chain/repeat,

And it wouldn't work in this case. chain(iterable, repeat(None))
changes your iterable into an iterator that first gives you
all elements in the iterator and when these are exhausted
will continue giving the repeat parameter. e.g.

  chain([3,5,8],repeat("Bye")

Will produce  3, 5 and 8 followed by an endless stream
of "Bye".

But if you do this with all iterables, and you have to
because you don't know which one is the smaller, all
iterators will be infinite and izip will never stop.

-- 
Antoon Pardon
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread Bengt Richter

On 4 Jan 2006 15:20:43 -0800, "Raymond Hettinger" <[EMAIL PROTECTED]> wrote:

>Paul Rubin wrote:
>> What do you think of my suggestion of passing an optional arg to the
>> StopIteration constructor, that the caller can use to resume the
>> iterator or take other suitable recovery steps?  Maybe this could
>> interact with PEP 343 in some useful way.
>
>Probably unworkable.  Complex to explain and use.   Makes the API
>heavy.  Hard to assure retro-fitting for every possible kind of
>iterator or iterable.
>
>Am not sure of the best solution:
>
>1. Could add an optional arg to zip()/izip() with a mutable container
>to hold a final, incomplete tuple:  final=[]; zip(a,b,leftover=final).
>This approach is kludgy and unlikely to lead to beautiful code, but it
>does at least make accessible data that would otherwise be tossed.
>
>2. Could add a new function with None fill-in -- essentially an
>iterator version of map(None, a,b).  Instead of None, a user specified
>default value would be helpful in cases where the input data stream
>could potentially have None as a valid data element.  The function
>would also need periodic signal checks to make it possible to break-out
>if one the inputs is infinite. How or whether such a function would be
>used can likely be answered by mining real-world code for cases where
>map's None fill-in feature was used.
>
>3. Could point people to the roundrobin() recipe in the
>collections.deque docs -- it solves a closely related problem but is
>not exactly what the OP needed (his use case required knowing which
>iterator gave birth to each datum).
>
>4. Could punt and leave this for straight-forward while-loop coding.
>Though the use case seems like it would be common, there may be a
>reason this hasn't come up since zip() was introduced way back in
>Py2.0.
>
>5. Could create an iterator wrapper that remembers its last accessed
>item and whether StopIteration has been raised.  While less direct than
>a customized zip method, the wrapper may be useful in contexts other
>than zipping -- essentially, anywhere it is inconvenient to have just
>consumed an iterator element.  Testing the wrapper object for
>StopIteration would be akin to else-clauses in a for-loop. OTOH, this
>approach is at odds with the notion of side-effect free functional
>programming and the purported benefits of that programming style.
>
>
6. Could modify izip so that one could write

from itertools import izip
zipit = izip(*seqs) # bind iterator object to preserve access to its state 
later
for tup in zipit:
# do something with tup as now produced
for tup in zipit.rest(sentinel):
# tup starts with the tuple that would have been returned if all 
sequences
# had been sampled and sentinel substituted where StopIteration 
happened.
# continuing until but not including (sentinel,)*len(seqs)

This would seem backwards compatible, and also potentially allow you to use the 
rest mode
from the start, as in

for tup in izip(*seqs).rest(sentinel):
# process tup and notice sentinel for yourself


Regards,
Bengt Richter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread Raymond Hettinger

Paul Rubin wrote:
> What do you think of my suggestion of passing an optional arg to the
> StopIteration constructor, that the caller can use to resume the
> iterator or take other suitable recovery steps?  Maybe this could
> interact with PEP 343 in some useful way.

Probably unworkable.  Complex to explain and use.   Makes the API
heavy.  Hard to assure retro-fitting for every possible kind of
iterator or iterable.

Am not sure of the best solution:

1. Could add an optional arg to zip()/izip() with a mutable container
to hold a final, incomplete tuple:  final=[]; zip(a,b,leftover=final).
This approach is kludgy and unlikely to lead to beautiful code, but it
does at least make accessible data that would otherwise be tossed.

2. Could add a new function with None fill-in -- essentially an
iterator version of map(None, a,b).  Instead of None, a user specified
default value would be helpful in cases where the input data stream
could potentially have None as a valid data element.  The function
would also need periodic signal checks to make it possible to break-out
if one the inputs is infinite. How or whether such a function would be
used can likely be answered by mining real-world code for cases where
map's None fill-in feature was used.

3. Could point people to the roundrobin() recipe in the
collections.deque docs -- it solves a closely related problem but is
not exactly what the OP needed (his use case required knowing which
iterator gave birth to each datum).

4. Could punt and leave this for straight-forward while-loop coding.
Though the use case seems like it would be common, there may be a
reason this hasn't come up since zip() was introduced way back in
Py2.0.

5. Could create an iterator wrapper that remembers its last accessed
item and whether StopIteration has been raised.  While less direct than
a customized zip method, the wrapper may be useful in contexts other
than zipping -- essentially, anywhere it is inconvenient to have just
consumed an iterator element.  Testing the wrapper object for
StopIteration would be akin to else-clauses in a for-loop. OTOH, this
approach is at odds with the notion of side-effect free functional
programming and the purported benefits of that programming style.

Raymond Hettinger

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread rurpy

> I don't understand this.   Why do you need look ahead?

Just before I posted, I got it (I think) but didn't want to
rewrite everything.  The need for unget() (or peek(), etc)
is to fix the thrown-away-data problem in izip(), right?

As an easier alternative, what about leaving izip() alone
and simply documenting that behavior.  That is, izip()
is not appropriate for use with unequal length iterables
unless you don't care what happens after the shortest,
and the state of the iterables is undefined after izip().

Then have an izip2() or a flag to izip() that changes it's
behavior, that results in iteration to the end of the
longest sequence.

This seems to me clean and symetrical -- one form
for iteration up to the shortest, the other form iterates
to the longest.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread rurpy

"Raymond Hettinger" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> [EMAIL PROTECTED] wrote:
> > izip's uses can be partitioned two ways:
> > 1. All iterables have equal lengths
> > 2. Iterables have different lengths.
> >
> > Case 1 is no problem obviously.
> > In Case 2 there are two sub-cases:
> >
> > 2a. You don't care what values occur in the other iterators
> >   after then end of the shortest.
> > 2b. You do care.
> >
> > In my experience 1 and 2b are the cases I encounter the most.
> > Seldom do I need case 2a.  That is, when I can have iterators
> > of unequal length, usually I want to do *something* with the
> > extra items in the longer iterators.  Seldom do I want to just
> > ignore them.
>
> That is a reasonable use case that is not supported by zip() or izip()
> as currently implemented.

I haven't thought a lot about zip because I haven't needed to.
I would phrase this as "...not supported by the itertools module...".
If it makes sense to extend izip() to provide end-of-longest
iteration, fine.  If not that adding an izip_longest() to itertools
(and perhaps a coresponding imap and whatever else shares
the terminate-at-shortest behavior.)

> > The whole point of using izip is to make the code shorter,
> > more concise, and easier to write and understand.
>
> That should be the point of using anything in Python.  The specific
> goal for izip() was for an iterator version of zip().  Unfortunately,
> neither tool fits your problem.  At the root of it is the iterator
> protocol not having an unget() method for pushing back unused elements
> of the data stream.

I don't understand this.   Why do you need look ahead?  (I
mean that literally,  I am not disagreeing in a veiled way.)

This is my (mis?)understanding of how izip works:
- izip is a class
- when instantiated, it returns another iterator object, call it "x".
- the x object (being an iterator) has a next method that
  returns a list of the next values returned by all the iterators
  given when x was created.

So why can't izip's next method collect the results of
it's set of argument iterators, as I presume it does now,
except when one of them starts generating StopIteration
exceptions, an alternate value is placed in the result list.
When all the iterators start generating exceptions, izip
itself raises a StopIteration to signal that all the iterators
have reached exhaustion.  This is what the code I posted
in a message last night does.  Why is something like that
not acceptable?

All this talk of pushbacks and returning shorter lists of
unexhausted iterators makes me think I am misunderstanding
something.

> > This should be pointed out in the docs,
>
> I'll add a note to the docs.
>
> > However, it would be better if izip could be made useful
> > fot case 2b situations.  Or maybe, an izip2 (or something)
> > added.
>
> Feel free to submit a feature request to the SF tracker (surprisingly,
> this behavior has not been previously reported, nor have there any
> related feature requests, nor was the use case contemplated in the PEP
> discussions: http://www.python.org/peps/pep-0201 ).

Yes, this is interesting.  In the print multiple columns"
example I presented, I felt the use of izip() met the
"one obvious way" test.  The resulting code was simple
and clear.  The real-world case where I ran into the
problem was comparing two files until two different
lines were found.  Again, izip was the "one obvious
way".

So yes it is surprising and disturbing that these use
cases were not identified.  I wonder what other features
that "should" be in Python, were similarly missed?
And more importantly what needs to change, to fix 
the problem?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread Tom Anderson

On Wed, 4 Jan 2006, Raymond Hettinger wrote:

> [EMAIL PROTECTED] wrote:
>
>> The whole point of using izip is to make the code shorter, more 
>> concise, and easier to write and understand.
>
> That should be the point of using anything in Python.  The specific goal 
> for izip() was for an iterator version of zip().  Unfortunately, neither 
> tool fits your problem.  At the root of it is the iterator protocol not 
> having an unget() method for pushing back unused elements of the data 
> stream.

An unget() isn't absolutely necessary - another way of doing it would be a 
hasNext() method, as in Java, or a peek(), which gets the next item but 
doesn't advance the iterator.

Here's some code (pardon the old-fashioned functional style in the 
iter_foo methods):

import operator

class xiterable(object):
"""This is an entirely abstract class, just to document the
xiterable interface.

"""
def __iter__(self):
"""As in the traditional iterable protocol, returns an
iterator over this object. Note that this does not have to
be an xiterator.

"""
raise NotImplementedError
def __xiter__(self):
"""Returns an xiterator over this object.

"""
raise NotImplementedError

class xiterator(xiterable):
"""This is an entirely abstract class, just to document the xiter
interface.

The xiterable methods should return self.
"""
def hasNext(self):
"""Returns True if calling next would return a value, or
False if it would raise StopIteration.

"""
raise NotImplementedError
def next(self):
"""As in the traditional iterator protocol.

"""
raise NotImplementedError
def peek(self):
"""Returns the value that would be returned by a call to
next, but does not advance the iterator - the same value
will be returned by the next call to peek or next. If a
call to next would raise StopIteration, this method
raises StopIteration.

"""
raise NotImplementedError

def xiter(iterable):
if (hasattr(iterable, "__xiter__")):
return iterable.__xiter__()
else:
return xiterwrapper(iter(iterable))

class xiterwrapper(object):
def __init__(self, it):
self.it = it
self.advance()
def hasNext(self):
return hasattr(self, "_next")
def next(self):
try:
cur = self._next
self.advance()
return cur
except AttributeError:
raise StopIteration
def peek(self):
try:
return self._next
except AttributeError:
raise StopIteration
def advance(self):
try:
self._next = self.it.next()
except StopIteration:
if (hasattr(self, "_next")):
del self._next
def __xiter__(self):
return self
def __iter__(self):
return self

def izip_hasnext(*xiters):
xiters = map(xiter, xiters)
while True:
if (reduce(operator.and_, map(hasnext, xiters))):
yield tuple(map(getnext, xiters))
else:
raise StopIteration

def hasnext(xit):
return xit.hasNext()

def getnext(it):
return it.next()

def izip_peek(*xiters):
xiters = map(xiter, xiters)
while True:
z = tuple(map(peek, xiters))
map(advance, xiters)
yield z

def peek(xit):
return xit.peek()

def advance(xit):
return xit.advance()

Anyway, you get the general idea.

tom

-- 
I am the best at what i do.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread Paul Rubin

"Raymond Hettinger" <[EMAIL PROTECTED]> writes:
> Feel free to submit a feature request to the SF tracker (surprisingly,
> this behavior has not been previously reported, nor have there any
> related feature requests, nor was the use case contemplated in the PEP
> discussions: http://www.python.org/peps/pep-0201 ).

What do you think of my suggestion of passing an optional arg to the
StopIteration constructor, that the caller can use to resume the
iterator or take other suitable recovery steps?  Maybe this could
interact with PEP 343 in some useful way.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-04 Thread Raymond Hettinger

[EMAIL PROTECTED] wrote:
> izip's uses can be partitioned two ways:
> 1. All iterables have equal lengths
> 2. Iterables have different lengths.
>
> Case 1 is no problem obviously.
> In Case 2 there are two sub-cases:
>
> 2a. You don't care what values occur in the other iterators
>   after then end of the shortest.
> 2b. You do care.
>
> In my experience 1 and 2b are the cases I encounter the most.
> Seldom do I need case 2a.  That is, when I can have iterators
> of unequal length, usually I want to do *something* with the
> extra items in the longer iterators.  Seldom do I want to just
> ignore them.

That is a reasonable use case that is not supported by zip() or izip()
as currently implemented.

> The whole point of using izip is to make the code shorter,
> more concise, and easier to write and understand.

That should be the point of using anything in Python.  The specific
goal for izip() was for an iterator version of zip().  Unfortunately,
neither tool fits your problem.  At the root of it is the iterator
protocol not having an unget() method for pushing back unused elements
of the data stream.

> This should be pointed out in the docs,

I'll add a note to the docs.

> However, it would be better if izip could be made useful
> fot case 2b situations.  Or maybe, an izip2 (or something)
> added.

Feel free to submit a feature request to the SF tracker (surprisingly,
this behavior has not been previously reported, nor have there any
related feature requests, nor was the use case contemplated in the PEP
discussions: http://www.python.org/peps/pep-0201 ).

Raymond Hettinger

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread gene tani

[EMAIL PROTECTED] wrote:
> <[EMAIL PROTECTED]> wrote:

>
> pissed-offedly-yr's, rurpy
>
> 

Well, i'm sorry your pissed off.  I will say i believe that

map(None,*sequences)

mentioned above is a pretty commonly seen thing, as is padding shorter
sequence for zip/izip.  Also have you looked at difflib?

http://docs.python.org/lib/module-difflib.html

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread rurpy

<[EMAIL PROTECTED]> wrote:
> But here is my real question...
> Why isn't something like this in itertools, or why shouldn't
> it go into itertools?

Never mind...
I just read this in the source code for itertools.imap:

  1) Itertools are designed to be easily combined and chained together.
 Having all tools stop with the shortest input is a unifying
principle
 that makes it easier to combine finite iterators (supplying data)
with
 infinite iterators like count() and repeat() (for supplying
sequential
 or constant arguments to a function).

  2) In typical use cases for combining itertools, having one finite
data
 supplier run out before another is likely to be an error condition
which
 should not pass silently by automatically supplying None.

  3) The use cases for automatic None fill-in are rare -- not many
functions
 do something useful when a parameter suddenly switches type and
becomes
 None.

  4) If a need does arise, it can be met by __builtins__.map() or by
 writing:  chain(iterable, repeat(None)).

  5) Similar toolsets in Haskell and SML do not have automatic None
fill-in.

I know I shouldn't post this but...

Jezuz Freekin' Cripes!!
This is the crap that drives me up a wall using Python.
I spent 10 hours writing the original non-working code,
finding the probllem, posting to c.l.p, writing working
replacement code.  Why?  Because some pythonic
devdude sitting in his god throne declared that I don't
need to do what I needed to do!!

  1) Itertools are designed to be easily combined and chained together.
 Having all tools stop with the shortest input is a unifying
principle
 that makes it easier to combine finite iterators (supplying data)
with
 infinite iterators like count() and repeat() (for supplying
sequential
 or constant arguments to a function).

There is not a count or a repeat to be seen anywhere in
my code.  So let's see...  I am prevented from using a
tool I need to use so that I will have the ability to use
some tools which I don't need.  Wow, that makes a lot
of f'ing sense.

  2) In typical use cases for combining itertools, having one finite
data
 supplier run out before another is likely to be an error condition
which
 should not pass silently by automatically supplying None.

Just plain wrong.  Files are very commonly used iterators,
and the ttypical case with them is that they are of different
lengths.  And I have run into plenty of non-file iterators of
differing lengths.  Declaring by fiat that these are all
error cases and shouldn't be permitted is bullshit.

  3) The use cases for automatic None fill-in are rare -- not many
functions
 do something useful when a parameter suddenly switches type and
becomes
 None.

So allow a caller-specified fill.  Not rocket science.

  4) If a need does arise, it can be met by __builtins__.map() or by
 writing:  chain(iterable, repeat(None)).

Yes, if youre a python guru.  I don't even understand the
code presented in this thread that uses chain/repeat, let
alone have any chance of writing it in less than a week.
For the average python user, having a tool to iterate to
the longest input is about 4 orders of magnitude simpler.

 5) Similar toolsets in Haskell and SML do not have automatic None
fill-in.

What the hell does this have to do with anything?  Maybe
there are other reasons like they have alternate, better
ways of doing the same thing?  Does the fact the C++
lack some feature justify leaving it out of Python?

Python is basically a pretty good language but there are
these big time holes in it.  I spend WAY too much time
trying to figure out how to do something that should be
easy, but isn't because someone thought that it might
hurt the "purity" of the language or violate some "principle".

pissed-offedly-yr's, rurpy

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread bonono


[EMAIL PROTECTED] wrote:
> It is clear that there is a real need for iterating in parallel
> over multiple iterators to the end of the longest one.  Why
> does something that stops at the shortest get included in
> the standard library, but one that stops after the longest
> doesn't?  Is there any hope for something like this being
> included in 2.5?
I wonder as well. map(None, ) does what you want, but it is not lazy.
imap(None,) for some reason don't do the same thing as map. And if you
don't want None as the sentinel, you can just write your own lambda(see
my other post). So for non-lazy need, map already supports what you
want.

The suggestion of "unget" in iterator is interesting too, as I also
once wrote a restartable iterator wrapper using tee() though later
scrap it as I found that turning them into list is easier so long I
don't need the lazy version.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread rurpy

"Duncan Booth" <[EMAIL PROTECTED]> wrote:
> Peter Otten wrote:
>
> > from itertools import izip, chain, repeat
> >
> > def prt_files (file1, file2):
> > file1 = chain(file1, repeat(""))
> > file2 = chain(file2, repeat(""))
> > for line1, line2 in iter(izip(file1, file2).next, ("", "")):
> > print line1.rstrip(), "\t", line2.rstrip()
> >
> > which can easily be generalized for an arbitrary number of files.
>
> Generalizing for an arbitrary number of files and for an arbitrary value to
> pad out the shorter sequences:
>
> def paddedizip(pad, *args):
> terminator = [pad] * (len(args)-1)
> def padder():
> if not terminator:
> return
> t = terminator.pop()
> while 1:
> yield t
> return izip(*(chain(a, padder()) for a in args))
>
> >>> for (p,q) in paddedizip(0,[1,2,3],[4,5]):
> print repr(p), repr(q)
>
>
> 1 4
> 2 5
> 3 0
[...more examples snipped...]

Here what I came up with:

def izipl (*iterables, **kwds):
sentinel = "" # Default value, maybe None would be better?
for k,v in kwds:  # Look for "sentinel" arg, error on any
other.
if k != "sentinel":
raise TypeError, "got an unexpected keyword argument
'%s'" % k
else: sentinel = v
iterables = map (iter, iterables)  # itertools.izip does this.

while iterables:
result = [];  cnt = 0
for i in iterables:
try: result.append (i.next())
except exceptions.StopIteration:
result.append (sentinel)
cnt += 1
if cnt == len (iterables): raise StopIteration
yield tuple(result)

Hmm, your function returns an izip object, mine just returns
the results of the iteration.  So I guess my function would
be the next() method of a izipl class?  I still have not got
my head around this stuff :-(

But here is my real question...
Why isn't something like this in itertools, or why shouldn't
it go into itertools?

It is clear that there is a real need for iterating in parallel
over multiple iterators to the end of the longest one.  Why
does something that stops at the shortest get included in
the standard library, but one that stops after the longest
doesn't?  Is there any hope for something like this being
included in 2.5?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread rurpy

<[EMAIL PROTECTED]> wrote:
> But that is exactly the behaviour of python iterator, I don't see what
> is broken.
>
> izip/zip just read from the respectives streams and give back a tuple,
> if it can get one from each, otherwise stop. And because python
> iterator can only go in one direction, those consumed do lose in the
> zip/izip calls.

[This is really a reply to the thread in general, not specifically
to your response Bonono...]

Yes, I can understand the how the properties of iterators
and izip's design lead to the behavior I observed.
I am saying that the unfortunate interaction of those
properties leads to behavior that make izip essentially
useless in many cases that one would naively expect it
not to be, that that behavior is not pointed out in the docs,
and is subtle enough that it is not realistic to expect
most users to realize it based of the properties of izip
and iterators alone.

izip's uses can be partitioned two ways:
1. All iterables have equal lengths
2. Iterables have different lengths.

Case 1 is no problem obviously.
In Case 2 there are two sub-cases:

2a. You don't care what values occur in the other iterators
  after then end of the shortest.
2b. You do care.

In my experience 1 and 2b are the cases I encounter the most.
Seldom do I need case 2a.  That is, when I can have iterators
of unequal length, usually I want to do *something* with the
extra items in the longer iterators.  Seldom do I want to just
ignore them.

In case 2b one cannot (naively) use izip, because izip
irretrievably throws away data when the end of the
shortest iterable is reached.

The whole point of using izip is to make the code shorter,
more concise, and easier to write and understand.   If I
have to add a lot of extra code to work around izip's problem,
or write my own izip function, then there is no point using
izip().  Or I could just write a simple while loop and handle
the iterators' exhaustions individually.
Ergo, izip is useless for situations involving case 2b.
This should be pointed out in the docs, particularly
since, depending on the order of izip's arguments,
it can appear to be working as one might initially
but erroneously think it should.

However, it would be better if izip could be made useful
fot case 2b situations.  Or maybe, an izip2 (or something)
added.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Tom Anderson

On Tue, 3 Jan 2006, it was written:

> [EMAIL PROTECTED] writes:
>
>> The problem is that sometimes, depending on which file is the shorter, 
>> a line ends up missing, appearing neither in the izip() output, or in 
>> the subsequent direct file iteration.  I would guess that it was in 
>> izip's buffer when izip terminates due to the exception on the other 
>> file.
>
> A different possible long term fix: change StopIteration so that it
> takes an optional arg that the program can use to figure out what
> happened.  Then change izip so that when one of its iterator args runs
> out, it wraps up the remaining ones in a new tuple and passes that
> to the StopIteration it raises.

+1

I think you also want to send back the items you read out of the iterators 
which are still alive, which otherwise would be lost. Here's a somewhat 
minimalist (but tested!) implementation:

def izip(*iters):
while True:
z = []
try:
for i in iters:
z.append(i.next())
yield tuple(z)
except StopIteration:
raise StopIteration, z

The argument you get back with the exception is z, the list of items read 
before the first empty iterator was encountered; if you still have your 
array iters hanging about, you can find the iterator which stopped with 
iters[len(z)], the ones which are still going with iters[:len(z)], and the 
ones which are in an uncertain state, since they were never tried, with 
iters[(len(z) + 1):]. This code could easily be extended to return more 
information explicitly, of course, but simple, sparse, etc.

> You would want some kind of extended for-loop syntax (maybe involving 
> the new "with" statement) with a clean way to capture the exception 
> info.

How about for ... except?

for z in izip(a, b):
lovingly_fondle(z)
except StopIteration, leftovers:
angrily_discard(leftovers)

This has the advantage of not giving entirely new meaning to an existing 
keyword. It does, however, afford the somewhat dubious use:

for z in izip(a, b):
lovingly_fondle(z)
except ValueError, leftovers:
pass # execution should almost certainly never get here

Perhaps that form should be taken as meaning:

try:
for z in izip(a, b):
lovingly_fondle(z)
except ValueError, leftovers:
pass # execution could well get here if the fondling goes wrong

Although i think it would be more strictly correct if, more generally, it 
made:

for LOOP_VARIABLE in ITERATOR:
SUITE
except EXCEPTION:
HANDLER

Work like:

try:
while True:
try:
LOOP_VARIABLE = ITERATOR.next()
except EXCEPTION:
raise __StopIteration__, sys.exc_info()
except StopIteration:
break
SUITE
except __StopIteration__, exc_info:
somehow_set_sys_exc_info(exc_info)
HANDLER

As it stands, throwing a StopIteration in the suite inside a for loop 
doesn't terminate the loop - the exception escapes; by analogy, the 
for-except construct shouldn't trap exceptions from the loop body, only 
those raised by the iterator.

tom

-- 
Chance? Or sinister scientific conspiracy?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread bonono


Paul Rubin wrote:
> Any idea how Haskell would deal with this?
I don't recall haskell has the map(None,...) behaviour in the standard
Prelude. But then, I don't see how the iterator concept would fit into
haskell as well.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Duncan Booth

Peter Otten wrote:

> from itertools import izip, chain, repeat
> 
> def prt_files (file1, file2):
> file1 = chain(file1, repeat(""))
> file2 = chain(file2, repeat(""))
> for line1, line2 in iter(izip(file1, file2).next, ("", "")):
> print line1.rstrip(), "\t", line2.rstrip()
>  
> which can easily be generalized for an arbitrary number of files.

Generalizing for an arbitrary number of files and for an arbitrary value to 
pad out the shorter sequences:

def paddedizip(pad, *args):
terminator = [pad] * (len(args)-1)
def padder():
if not terminator:
return
t = terminator.pop()
while 1:
yield t
return izip(*(chain(a, padder()) for a in args))

>>> for (p,q) in paddedizip(0,[1,2,3],[4,5]):
print repr(p), repr(q)


1 4
2 5
3 0
>>> for (p,q) in paddedizip(0,[1,2,3],[4,5,6,7,8]):
print repr(p), repr(q)


1 4
2 5
3 6
0 7
0 8
>>> for (p,q) in paddedizip("",[1,2,3],[4,5,6,7,8]):
print repr(p), repr(q)


1 4
2 5
3 6
'' 7
'' 8
>>> for (p,q,r) in paddedizip(None,[1,2,3],[4,5,6,7,8],[9]):
print repr(p), repr(q), repr(r)


1 4 9
2 5 None
3 6 None
None 7 None
None 8 None
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Paul Rubin

"Diez B. Roggisch" <[EMAIL PROTECTED]> writes:
> No. If you want that, use
> 
> list(iterable)
> 
> Then you have random access. If you _know_ there will be only so much data
> needed to "unget", write yourself a buffered iterator like this:

You can't use list(iterable) in general because the iterable may be
infinite.

> buffered(iterable, size)
> 
> Maybe something like that _could_ go in the itertools. But I'm not really
> convinced, as it is too tied to special cases -

The usual pushback depth needed is just one item and it would solve
various situations like this.  The ASPN recipe I cited for detecting
an empty iterable shows that such problems come up in more than one
place.

Note that besides the buffered iterable, there would also have to be a
version of izip that pushed unused items back onto its iterables.

> and besides that very easily done.

Just about everything in itertools is very easily done, so that's not
a valid argument against it.  

> > How about this (untested):
> > 
> >   def myzip(iterlist):
> > """return zip of smaller and smaller list of iterables as the
> > individual iterators run out"""
> 
> If that fits your semantics - of course. But the general zip shouldn't
> behave that way.

Of course it shouldn't.  In another post I suggested a way to extend
the general izip, to throw a list of the remaining non-empty iterables
once it hit an empty one.  Maybe there's some problem with that too,
but if so, it's more subtle.

Any idea how Haskell would deal with this?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Diez B. Roggisch

> What's broken is the iterator interface is insufficient to deal with
> this cleanly.

I don't consider it broken. You just think too much in terms of the OPs
problems or probably other fields where the actual data is available for
"rewinding".

But as iterators serve as abstraction for lots of things - especially
generatiors - you can't enhance the interface.

> 
> Yes, that's the problem.  It's proven useful for i/o streams to support
> a pushback operation like ungetc.  Maybe something like it can be done
> for iterators.

No. If you want that, use

list(iterable)

Then you have random access. If you _know_ there will be only so much data
needed to "unget", write yourself a buffered iterator like this:

buffered(iterable, size)

Maybe something like that _could_ go in the itertools. But I'm not really
convinced, as it is too tied to special cases - and besides that very
easily done.


> How about this (untested):
> 
>   def myzip(iterlist):
> """return zip of smaller and smaller list of iterables as the
> individual iterators run out"""
> sentinel = object()  # unique sentinel
> def sentinel_append(iterable):
>return itertools.chain(iterable, itertools.repeat(sentinel))
> for i in itertools.izip(map(sentinel_append, iterlist)):
>r = [x for x in i.next() if x is not sentinel]
>if r: yield r
>else: break

If that fits your semantics - of course. But the general zip shouldn't
behave that way.

Regards,

Diez
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread bonono


Paul Rubin wrote:
> [EMAIL PROTECTED] writes:
> > map(None,[1,2,3],[4,5]) gives [(1,4),(2,5),(3,None)]
>
> I didn't know that until checking the docs just now.  Oh man, what a
> hack!  I always thought Python should have a built-in identity
> function for situations like that.  I guess it does the above instead.
> Thanks.  Jeez ;-)
Of course, for OP's particular case, I think a specialized func() is
even better, as the None are turned into "" in the process which is
needed for string operation.

map(lambda *arg: tuple(map(lambda x: x is not None and x or "", arg)),
["a","b","c"],["d","e"])

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Paul Rubin

[EMAIL PROTECTED] writes:
> map(None,[1,2,3],[4,5]) gives [(1,4),(2,5),(3,None)]

I didn't know that until checking the docs just now.  Oh man, what a
hack!  I always thought Python should have a built-in identity
function for situations like that.  I guess it does the above instead.
Thanks.  Jeez ;-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread bonono


Paul Rubin wrote:
> > I think you need to use map(None,...) which would not drop anything,
> > just None filled. Though you don't have a relatively lazy version as
> > imap(None,...) doesn't behave like map but a bit like zip.
>
> I don't understand what you mean by this?  None is not callable.

zip([1,2,3],[4,5])  gives [(1,4),(2,5)]

map(None,[1,2,3],[4,5]) gives [(1,4),(2,5),(3,None)]

So the result of map() can be filtered out for special processing. Of
course, your empty/sentinel filled version is doing more or less the
same thing.

>
> How about this (untested):
>
>   def myzip(iterlist):
> """return zip of smaller and smaller list of iterables as the
> individual iterators run out"""
> sentinel = object()  # unique sentinel
> def sentinel_append(iterable):
>return itertools.chain(iterable, itertools.repeat(sentinel))
> for i in itertools.izip(map(sentinel_append, iterlist)):
>r = [x for x in i.next() if x is not sentinel]
>if r: yield r
>else: break

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Paul Rubin

[EMAIL PROTECTED] writes:
> But that is exactly the behaviour of python iterator, I don't see what
> is broken.

What's broken is the iterator interface is insufficient to deal with
this cleanly.

> And because python iterator can only go in one direction, those
> consumed do lose in the zip/izip calls.

Yes, that's the problem.  It's proven useful for i/o streams to support
a pushback operation like ungetc.  Maybe something like it can be done
for iterators.

> I think you need to use map(None,...) which would not drop anything,
> just None filled. Though you don't have a relatively lazy version as
> imap(None,...) doesn't behave like map but a bit like zip.

I don't understand what you mean by this?  None is not callable.

How about this (untested):

  def myzip(iterlist):
"""return zip of smaller and smaller list of iterables as the
individual iterators run out"""
sentinel = object()  # unique sentinel
def sentinel_append(iterable):
   return itertools.chain(iterable, itertools.repeat(sentinel))
for i in itertools.izip(map(sentinel_append, iterlist)):
   r = [x for x in i.next() if x is not sentinel]
   if r: yield r
   else: break
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Peter Otten

[EMAIL PROTECTED] wrote:

> The problem is that sometimes, depending on which
> file is the shorter, a line ends up missing,
> appearing neither in the izip() output, or in
> the subsequent direct file iteration.  I would
> guess that it was in izip's buffer when izip
> terminates due to the exception on the other file.

With the current iterator protocol you cannot feed an item that you've read
from an iterator by calling its next() method back into it; but invoking
next() is the only way to see whether the iterator is exhausted. Therefore
the behaviour that breaks your prt_files() function has nothing to do with
the itertools. 
I think of itertools more as of a toolbox instead of a set of ready-made
solutions and came up with

from itertools import izip, chain, repeat

def prt_files (file1, file2):
file1 = chain(file1, repeat(""))
file2 = chain(file2, repeat(""))
for line1, line2 in iter(izip(file1, file2).next, ("", "")):
print line1.rstrip(), "\t", line2.rstrip()

which can easily be generalized for an arbitrary number of files.

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread David Murmann

[EMAIL PROTECTED] schrieb:
> [izip() eats one line]

as far as i can see the current implementation cannot be changed
to do the Right Thing in your case. pythons iterators don't allow
to "look ahead", so izip can only get the next element. if this
fails for an iterator, everything up to that point is lost.

maybe the documentation for izip should note that the given
iterators are not necessarily in a sane state afterwards.

for your problem you can do something like:

def izipall(*args):
iters = [iter(it) for it in args]
while iters:
result = []
for it in iters:
try:
x = it.next()
except StopIteration:
iters.remove(it)
else:
result.append(x)
yield tuple(result)

note that this does not yield tuples that are always the same
length, so "for x, y in izipall()" won't work. instead, do something
like "for seq in izipall(): print '\t'.join(seq)".

hope i was clear enough, David.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread bonono

But that is exactly the behaviour of python iterator, I don't see what
is broken.

izip/zip just read from the respectives streams and give back a tuple,
if it can get one from each, otherwise stop. And because python
iterator can only go in one direction, those consumed do lose in the
zip/izip calls.

I think you need to use map(None,...) which would not drop anything,
just None filled. Though you don't have a relatively lazy version as
imap(None,...) doesn't behave like map but a bit like zip.

[EMAIL PROTECTED] wrote:
> The code below should be pretty self-explanatory.
> I want to read two files in parallel, so that I
> can print corresponding lines from each, side by
> side.  itertools.izip() seems the obvious way
> to do this.
>
> izip() will stop interating when it reaches the
> end of the shortest file.  I don't know how to
> tell which file was exhausted so I just try printing
> them both.  The exhausted one will generate a
> StopInteration, the other will continue to be
> iterable.
>
> The problem is that sometimes, depending on which
> file is the shorter, a line ends up missing,
> appearing neither in the izip() output, or in
> the subsequent direct file iteration.  I would
> guess that it was in izip's buffer when izip
> terminates due to the exception on the other file.
>
> This behavior seems plain out broken, especially
> because it is dependent on order of izip's
> arguments, and not documented anywhere I saw.
> It makes using izip() for iterating files in
> parallel essentially useless (unless you are
> lucky enough to have files of the same length).
>
> Also, it seems to me that this is likely a problem
> with any iterables with different lengths.
> I am hoping I am missing something...
>
> #-
> # Task: print contents of file1 in column 1, and
> # contents of file2 in column two.  iterators and
> # izip() are the "obvious" way to do it.
>
> from itertools import izip
> import cStringIO, pdb
>
> def prt_files (file1, file2):
>
> for line1, line2 in izip (file1, file2):
> print line1.rstrip(), "\t", line2.rstrip()
>
> try:
> for line1 in file1:
> print line1,
> except StopIteration: pass
>
> try:
> for line2 in file2:
> print "\t",line2,
> except StopIteration: pass
>
> if __name__ == "__main__":
> # Use StringIO to simulate files.  Real files
> # show the same behavior.
> f = cStringIO.StringIO
>
> print "Two files with same number of lines work ok."
> prt_files (f("abc\nde\nfgh\n"), f("xyz\nwv\nstu\n"))
>
> print "\nFirst file shorter is also ok."
> prt_files (f("abc\nde\n"), f("xyz\nwv\nstu\n"))
>
> print "\nSecond file shorter is a problem."
> prt_files (f("abc\nde\nfgh\n"), f("xyz\nwv\n"))
> print "What happened to \"fgh\" line that should be in column
> 1?"
>
> print "\nBut only a problem for one line."
> prt_files (f("abc\nde\nfgh\nijk\nlm\n"), f("xyz\nwv\n"))
> print "The line \"fgh\" is still missing, but following\n" \
> "line(s) are ok!  Looks like izip() ate a line."

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: itertools.izip brokeness

2006-01-03 Thread Paul Rubin

[EMAIL PROTECTED] writes:
> The problem is that sometimes, depending on which file is the
> shorter, a line ends up missing, appearing neither in the izip()
> output, or in the subsequent direct file iteration.  I would guess
> that it was in izip's buffer when izip terminates due to the
> exception on the other file.

Oh man, this is ugly.  The problem is there's no way to tell whether
an iterator is empty, other than by reading from it.  

  http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/413614

has a kludge that you can use inside a function but that's no good
for something like izip.  

For a temporary hack you could make a wrapped iterator that allows
pushing items back onto the iterator (sort of like ungetc) and a
version of izip that uses it, or a version of izip that tests the
iterators you pass it using the above recipe.

It's probably not reasonable to ask that an emptiness test be added to
the iterator interface, since the zillion iterator implementations now
existing won't support it.  

A different possible long term fix: change StopIteration so that it
takes an optional arg that the program can use to figure out what
happened.  Then change izip so that when one of its iterator args runs
out, it wraps up the remaining ones in a new tuple and passes that
to the StopIteration it raises.  Untested:

   def izip(*iterlist):
  while True:
z = []
finished = []  # iterators that have run out
still_alive = []   # iterators that are still alive
  for i in iterlist:
 try:
z.append(i.next())
still_alive.append(i)
 except StopIteration:
finished.append(i)
  if not finished:
 yield tuple(z)
  else:  
 raise StopIteration, (still_alive, finished)

You would want some kind of extended for-loop syntax (maybe involving
the new "with" statement) with a clean way to capture the exception info.
You'd then use it to continue the izip where it left off, with the
new (smaller) list of iterators.
-- 
http://mail.python.org/mailman/listinfo/python-list

itertools.izip brokeness

2006-01-03 Thread rurpy

The code below should be pretty self-explanatory.
I want to read two files in parallel, so that I
can print corresponding lines from each, side by
side.  itertools.izip() seems the obvious way
to do this.

izip() will stop interating when it reaches the
end of the shortest file.  I don't know how to
tell which file was exhausted so I just try printing
them both.  The exhausted one will generate a
StopInteration, the other will continue to be
iterable.

The problem is that sometimes, depending on which
file is the shorter, a line ends up missing,
appearing neither in the izip() output, or in
the subsequent direct file iteration.  I would
guess that it was in izip's buffer when izip
terminates due to the exception on the other file.

This behavior seems plain out broken, especially
because it is dependent on order of izip's
arguments, and not documented anywhere I saw.
It makes using izip() for iterating files in
parallel essentially useless (unless you are
lucky enough to have files of the same length).

Also, it seems to me that this is likely a problem
with any iterables with different lengths.
I am hoping I am missing something...

#-
# Task: print contents of file1 in column 1, and
# contents of file2 in column two.  iterators and
# izip() are the "obvious" way to do it.

from itertools import izip
import cStringIO, pdb

def prt_files (file1, file2):

for line1, line2 in izip (file1, file2):
print line1.rstrip(), "\t", line2.rstrip()

try:
for line1 in file1:
print line1,
except StopIteration: pass

try:
for line2 in file2:
print "\t",line2,
except StopIteration: pass

if __name__ == "__main__":
# Use StringIO to simulate files.  Real files
# show the same behavior.
f = cStringIO.StringIO

print "Two files with same number of lines work ok."
prt_files (f("abc\nde\nfgh\n"), f("xyz\nwv\nstu\n"))

print "\nFirst file shorter is also ok."
prt_files (f("abc\nde\n"), f("xyz\nwv\nstu\n"))

print "\nSecond file shorter is a problem."
prt_files (f("abc\nde\nfgh\n"), f("xyz\nwv\n"))
print "What happened to \"fgh\" line that should be in column
1?"

print "\nBut only a problem for one line."
prt_files (f("abc\nde\nfgh\nijk\nlm\n"), f("xyz\nwv\n"))
print "The line \"fgh\" is still missing, but following\n" \
"line(s) are ok!  Looks like izip() ate a line."

-- 
http://mail.python.org/mailman/listinfo/python-list

42 matches

Mail list logo