RE: Controlling a generator the pythonic way

2005-06-13 Thread Delaney, Timothy C (Timothy)
FWIW, PEP 342 is now titled "Coroutines via Enhanced Iterators" :)

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-13 Thread Thomas Lotze
Thomas Lotze wrote:

> I'm trying to figure out what is the most pythonic way to interact with a
> generator.

JFTR, so you don't think I'd suddenly lost interest: I won't be able to
respond for a couple of days because I've just incurred a nice little
hospital session... will be back next week.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Controlling a generator the pythonic way

2005-06-13 Thread Delaney, Timothy C (Timothy)
Steve Holden wrote:

> Sigh indeed. But if you allow next() calls to take arguments you are
> effectively arguing for the introduction of full coroutines into the
> language, and I suspect there would be pretty limited support for
> that. 

You mean `PEP 342`_ which I posted earlier and is considered pretty
non-controversial?

I think I may suggest that the name of the PEP be changed to "Coroutines
using advanced iterators".

.. _`PEP 342`: http://www.python.org/peps/pep-0342.html

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-13 Thread Steve Holden
Thomas Lotze wrote:
> Peter Hansen wrote:
> 
> 
>>Thomas Lotze wrote:
>>
>>>I can see two possibilities to do this: either the current file position
>>>has to be read from somewhere (say, a mutable object passed to the
>>>generator) after each yield, [...]
>>
>>The third approach, which is certain to be cleanest for this situation, is
>>to have a custom class which stores the state information you need, and
>>have the generator simply be a method in that class.
> 
> 
> Which is, as far as the generator code is concerned, basically the same as
> passing a mutable object to a (possibly standalone) generator. The object
> will likely be called self, and the value is stored in an attribute of it.
> 
> Probably this is indeed the best way as it doesn't require the programmer
> to remember any side-effects.
> 
> It does, however, require a lot of attribute access, which does cost some
> cycles.
> 
Hmm, you could probably make your program run even quicker if you took 
out all the code :-)

Don't assume that there will be a perceptible impact on performance 
until you have written it they easy way. I'll leave you to Google for 
quotes from Donald Knuth about premature optimization.

> A related problem is skipping whitespace. Sometimes you don't care about
> whitespace tokens, sometimes you do. Using generators, you can either set
> a state variable, say on the object the generator is an attribute of,
> before each call that requires a deviation from the default, or you can
> have a second generator for filtering the output of the first. Again, both
> solutions are ugly (the second more so than the first). One uses
> side-effects instead of passing parameters, which is what one really
> wants, while the other is dumb and slow (filtering can be done without
> taking a second look at things).
> 
And, again, your obsession with performance obscure the far more 
important issue: which solution is easiest to write and maintain. If the 
user then turns up short of cycles they can always elect to migrate to a 
faster computer: this will almost inevitably be cheaper than paying you 
to speed the program up.

> All of this makes me wonder whether more elaborate generator semantics
> (maybe even allowing for passing arguments in the next() call) would not
> be useful. And yes, I have read the recent postings on PEP 343 - sigh.
> 
Sigh indeed. But if you allow next() calls to take arguments you are 
effectively arguing for the introduction of full coroutines into the 
language, and I suspect there would be pretty limited support for that.

regards
  Steve
-- 
Steve Holden+1 703 861 4237  +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/
Python Web Programming  http://pydish.holdenweb.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Controlling a generator the pythonic way

2005-06-12 Thread Delaney, Timothy C (Timothy)
Thomas Lotze wrote:

> call. The picture might fit better (IMO) if it didn't look so much
> like working around the fact that the next() call can't take
> parameters for some technical reason. 

You might want to take a look at PEP 342
. Doesn't help you now, but it
will in the future.

Tim Delaney
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-12 Thread Terry Reedy

"news:[EMAIL PROTECTED]
> Thomas Lotze <[EMAIL PROTECTED]> writes:
>> A related problem is skipping whitespace. Sometimes you don't care about
>> whitespace tokens, sometimes you do. Using generators, you can either 
>> set
>> a state variable, say on the object the generator is an attribute of,
>> before each call that requires a deviation from the default, or you can
>> have a second generator for filtering the output of the first. Again, 
>> both
>> solutions are ugly (the second more so than the first).

Given an application that *only* wanted non-white tokens, or tokens meeting 
any other condition, filtering is, to me, exactly the right thing to do and 
not ugly at all.  See itertools or roll your own.

Given an application that intermittently wanted to skip over non-white 
tokens, I would use a *function*, not a second generator, that filtered the 
first when, and only when, that was wanted.  Given next_tok, the next 
method of a token generator, this is simply

def next_nonwhite():
   ret = next_tok()
   while not iswhte(ret):
  ret = next_tok()
   return ret

A generic method of sending data to a generator on the fly, without making 
it an attribute of a class, is to give the generator function a mutable 
parameter, a list, dict, or instance, which you mutate from outside as 
desired to change the operation of the generator.

The pair of statements
  
  val = gen.next()
can, of course, be wrapped in various possible gennext(args) functions at 
the cost of an additional function call.

Terry J. Reedy







-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-12 Thread Kent Johnson
Thomas Lotze wrote:
> Mike Meyer wrote:
> What worries me about the approach of changing state before making a
> next() call instead of doing it at the same time by passing a parameter is
> that the state change is meant to affect only a single call. The picture
> might fit better (IMO) if it didn't look so much like working around the
> fact that the next() call can't take parameters for some technical reason.

I suggest you make the tokenizer class itself into an iterator. Then you can 
define additional next() methods with additional parameters. You could wrap an 
actual generator for the convenience of having multiple yield statements. For 
example (borrowing Peter's PdfTokenizer):

class PdfTokenizer:
def __init__(self, ...):
# set up initial state
self._tokenizer = _getTokens()

def __iter__(self):
return self

def next(self, options=None):
# set self state according to options, if any
n = self._tokenizer.next()
# restore default state
return n

def nextIgnoringSpace(self):
# alterate way of specifying variations
# ...

def _getTokens(self):
while whatever:
yield token

def seek(self, newPosition):
# change state here

Kent
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-12 Thread Thomas Lotze
Thomas Lotze wrote:

> A related problem is skipping whitespace. Sometimes you don't care about
> whitespace tokens, sometimes you do. Using generators, you can either set
> a state variable, say on the object the generator is an attribute of,
> before each call that requires a deviation from the default, or you can
> have a second generator for filtering the output of the first.

Last night's sleep was really productive - I've also found another way
to tackle this problem, and it's really simple IMO. One could pass the
parameter at generator instantiation time and simply create two
generators behaving differently. They work on the same data and use the
same source code, only with a different parametrization.

All one has to care about is that they never get out of sync. If the
data pointer is an object attribute, it's clear how to do it. Otherwise,
both could acquire their data from a common generator that yields the
PDF content (or a buffer representing part of it) character by
character. This is even faster than keeping a pointer and using it as an
index on the data.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-12 Thread Thomas Lotze
Thomas Lotze wrote:

> Does anybody here have a third way of dealing with this?

Sleeping a night sometimes is an insightful exercise *g*

I realized that there is a reason why fiddling with the pointer from
outside the generator defeats much of the purpose of using one. The
implementation using a simple method call instead of a generator needs
to store some internal state variables on an object to save them for the
next call, among them the pointer and a tokenization mode.

I could make the thing a generator by turning the single return
statement into a yield statement and adding a loop, leaving all the
importing and exporting of the pointer intact - after all, someone might
reset the pointer between next() calls.

This is, however, hardly using all the possibilities a generator allows.
I'd rather like to get rid of the mode switches by doing special things
where I detect the need for them, yielding the result, and proceeding as
before. But as soon as I move information from explicit (state variables
that can be reset along with the pointer) to implicit (the point where
the generator is suspended after yielding a token), resetting the
pointer will lead to inconsistencies.

So, it seems to me that if I do want to use generators for any practical
reason instead of just because generators are way cool, they need to be
instantiated anew each time the pointer is reset, for simple consistency
reasons.

Now a very simple idea struck me: If one is worried about throwing away
a generator as a side-effect of resetting the tokenization pointer, why
not define the whole tokenizer as not being resettable? Then the thing
needs to be re-instantiated very explicitly every time it is pointed
somewhere. While still feeling slightly awkward, it has lost the threat
of doing unexpected things.

Does this sound reasonable?

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Peter Hansen wrote:

> Fair enough, but who cares what the generator code thinks?  It's what the
> programmer has to deal with that matters, and an object is going to have a
> cleaner interface than a generator-plus-mutable-object.

That's right, and among the choices discussed, the object is the one I do
prefer. I just don't feel really satisfied...

>> It does, however, require a lot of attribute access, which does cost
>> some cycles.
> 
> Hmm... "premature optimization" is all I have to say about that.

But when is the right time to optimize? There's a point when the thing
runs, does the right thing and - by the token of "make it run, make it
right, make it fast" - might get optimized. And if there are places in a
PDF library that might justly be optimized, the tokenizer is certainly one
of them as it gets called really often.

Still, I'm going to focus on cleaner code and, first and foremost, a clean
API if it comes to a decision between these goals and optimization - at
least as long as I'm talking about pure Python code.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Mike Meyer wrote:

> Yes, such a switch gets the desired behavior as a side effect. Then again,
> a generator that returns tokens has a desired behavior (advancing to the
> next token) as a side effect(*).

That's certainly true.

> If you think about these things as the
> state of the object, rather than "side effects", it won't seem nearly as
> ugly. In fact, part of the point of using a class is to encapsulate the
> state required for some activity in one place.
> 
> Wanting to do everything via parameters to methods is a very top-down way
> of looking at the problem. It's not necessarily correct in an OO
> environment.

What worries me about the approach of changing state before making a
next() call instead of doing it at the same time by passing a parameter is
that the state change is meant to affect only a single call. The picture
might fit better (IMO) if it didn't look so much like working around the
fact that the next() call can't take parameters for some technical reason.

I agree that decoupling state changes and next() calls would be perfectly
beautiful if they were decoupled in the problem one wants to model. They
aren't.

> *) It's noticable that some OO languages/libraries avoid this side
> effect: the read method updates an attribute, so you do the read then
> get the object read from the attribute. That's very OO, but not very
> pythonic.

Just out of curiosity: What makes you state that that behaviour isn't
pythonic? Is it because Python happens to do it differently, because of a
gut feeling, or because of some design principle behind Python I fail to
see right now?

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Mike Meyer
Thomas Lotze <[EMAIL PROTECTED]> writes:
> A related problem is skipping whitespace. Sometimes you don't care about
> whitespace tokens, sometimes you do. Using generators, you can either set
> a state variable, say on the object the generator is an attribute of,
> before each call that requires a deviation from the default, or you can
> have a second generator for filtering the output of the first. Again, both
> solutions are ugly (the second more so than the first). One uses
> side-effects instead of passing parameters, which is what one really
> wants, while the other is dumb and slow (filtering can be done without
> taking a second look at things).

I wouldn't call the first method ugly; I'd say it's *very* OO.

Think of an object instance as a machine. It has various knobs,
switches and dials you can use to control it's behavior, and displays
you can use to read data from it, or parts of its state . A switch
labelled "ignore whitespace" is a perfectly reasonable thing for a
tokenizing machine to have.

Yes, such a switch gets the desired behavior as a side effect. Then
again, a generator that returns tokens has a desired behavior
(advancing to the next token) as a side effect(*). If you think about
these things as the state of the object, rather than "side effects",
it won't seem nearly as ugly. In fact, part of the point of using a
class is to encapsulate the state required for some activity in one
place.

Wanting to do everything via parameters to methods is a very top-down
way of looking at the problem. It's not necessarily correct in an OO
environment.

  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Peter Hansen
Thomas Lotze wrote:
> Which is, as far as the generator code is concerned, basically the same as
> passing a mutable object to a (possibly standalone) generator. The object
> will likely be called self, and the value is stored in an attribute of it.

Fair enough, but who cares what the generator code thinks?  It's what 
the programmer has to deal with that matters, and an object is going to 
have a cleaner interface than a generator-plus-mutable-object.

> Probably this is indeed the best way as it doesn't require the programmer
> to remember any side-effects.
> 
> It does, however, require a lot of attribute access, which does cost some
> cycles.

Hmm... "premature optimization" is all I have to say about that.

-Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Peter Hansen wrote:

> Thomas Lotze wrote:
>> I can see two possibilities to do this: either the current file position
>> has to be read from somewhere (say, a mutable object passed to the
>> generator) after each yield, [...]
> 
> The third approach, which is certain to be cleanest for this situation, is
> to have a custom class which stores the state information you need, and
> have the generator simply be a method in that class.

Which is, as far as the generator code is concerned, basically the same as
passing a mutable object to a (possibly standalone) generator. The object
will likely be called self, and the value is stored in an attribute of it.

Probably this is indeed the best way as it doesn't require the programmer
to remember any side-effects.

It does, however, require a lot of attribute access, which does cost some
cycles.

A related problem is skipping whitespace. Sometimes you don't care about
whitespace tokens, sometimes you do. Using generators, you can either set
a state variable, say on the object the generator is an attribute of,
before each call that requires a deviation from the default, or you can
have a second generator for filtering the output of the first. Again, both
solutions are ugly (the second more so than the first). One uses
side-effects instead of passing parameters, which is what one really
wants, while the other is dumb and slow (filtering can be done without
taking a second look at things).

All of this makes me wonder whether more elaborate generator semantics
(maybe even allowing for passing arguments in the next() call) would not
be useful. And yes, I have read the recent postings on PEP 343 - sigh.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Controlling a generator the pythonic way

2005-06-11 Thread Peter Hansen
Thomas Lotze wrote:
> I can see two possibilities to do this: either the current file position
> has to be read from somewhere (say, a mutable object passed to the
> generator) after each yield, or a new generator needs to be instantiated
> every time the tokenizer is pointed to a new file position.
>... 
> Does anybody here have a third way of dealing with this? Otherwise,
> which ugliness is the more pythonic one?

The third approach, which is certain to be cleanest for this situation, 
is to have a custom class which stores the state information you need, 
and have the generator simply be a method in that class.  There's no 
reason that a generator has to be a standalone function.

class PdfTokenizer:
 def __init__(self, ...):
 # set up initial state

 def getTokens(self):
 while whatever:
 yield token

 def seek(self, newPosition):
 # change state here

# usage:
pdf = PdfTokenizer('myfile.pdf', ...)
for token in pdf.getTokens():
 # do stuff...

 if I need to change position:
 pdf.seek(...)

Easy as pie! :-)

-Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Controlling a generator the pythonic way

2005-06-11 Thread Thomas Lotze
Hi,

I'm trying to figure out what is the most pythonic way to interact with
a generator.

The task I'm trying to accomplish is writing a PDF tokenizer, and I want
to implement it as a Python generator. Suppose all the ugly details of
toknizing PDF can be handled (such as embedded streams of arbitrary
binary content). There remains one problem, though: In order to get
random file access, the tokenizer should not simply spit out a series of
tokens read from the file sequentially; it should rather be possible to
point it at places in the file at random.

I can see two possibilities to do this: either the current file position
has to be read from somewhere (say, a mutable object passed to the
generator) after each yield, or a new generator needs to be instantiated
every time the tokenizer is pointed to a new file position.

The first approach has both the disadvantage that the pointer value is
exposed and that due to the complex rules for hacking a PDF to tokens,
there will be a lot of yield statements in the generator code, which
would make for a lot of pointer assignments. This seems ugly to me.

The second approach is cleaner in that respect, but pointing the
tokenizer to some place has now the added semantics of creating a whole
new generator instance. The programmer using the tokenizer now needs to
remember to throw away any references to the generator each time the
pointer is reset, which is also ugly.

Does anybody here have a third way of dealing with this? Otherwise,
which ugliness is the more pythonic one?

Thanks a lot for any ideas.

-- 
Thomas
-- 
http://mail.python.org/mailman/listinfo/python-list