Re: Stream programming

Kiuhnm Sat, 24 Mar 2012 04:13:24 -0700

On 3/24/2012 4:23, Steven D'Aprano wrote:

On Fri, 23 Mar 2012 17:00:23 +0100, Kiuhnm wrote:

I've been writing a little library for handling streams as an excuse for
doing a little OOP with Python.

I don't share some of the views on readability expressed on this ng.
Indeed, I believe that a piece of code may very well start as complete
gibberish and become a pleasure to read after some additional
information is provided.

[...]

numbers - push - avrg - 'med' - pop - filter(lt('med'), ge('med'))\
      - ['same', 'same'] - streams(cat) - 'same'

Ok, we're at the "complete gibberish" phase.

Time to give you the "additional information".


There are multiple problems with your DSL. Having read your explanation,
and subsequent posts, I think I understand the data model, but the syntax
itself is not very good and far from readable. It is just too hard to
reason about the code.

Your syntax conflicts with established, far more common, use of the same
syntax: you use - to mean "call a function" and | to join two or more
streams into a flow.

You also use () for calling functions, and the difference between - and
() isn't clear. So a mystery there -- your DSL seems to have different
function syntax, depending on... what?

The semantics are unclear even after your examples. To understand your
syntax, you give examples, but to understand the examples, the reader
needs to understand the syntax. That suggests that the semantics are
unclear even in your own mind, or at least too difficult to explain in
simple examples.

Take this example:

Flows can be saved (push) and restored (pop) :
    [1,2,3,4] - push - by(2) - 'double' - pop | val('double')
        <=>  [1,2,3,4] | [2,4,6,8]


What the hell does that mean? The reader initially doesn't know what
*any* of push, by(2), pop or val('double') means. All they see is an
obfuscated series of calls that starts with a stream as input, makes a
copy of it, and doubles the entries in the copy: you make FIVE function
calls to perform TWO conceptual operations. So the reader can't even map
a function call to a result.

With careful thought and further explanations from you, the reader (me)
eventually gets a mental model here. Your DSL has a single input which is
pipelined through a series of function calls by the - operator, plus a
separate stack. (I initially thought that, like Forth, your DSL was stack
based. But it isn't, is it?)

It seems to me that the - operator is only needed as syntactic sugar to
avoid using reverse Polish notation and an implicit stack. Instead of the
Forth-like:

[1,2,3,4] dup 2 *

your DSL has an explicit stack, and an explicit - operator to call a
function. Presumably "[1,2] push" would be a syntax error.

I think this is a good example of an inferior syntax. Contrast your:

[1,2,3,4] - push - by(2) - 'double' - pop | val('double')

with the equivalent RPL:

[1,2,3,4] dup 2 *


I was just explaining how push and pop work.
I also said that
  [1,2,3,4] - [id,by(2)]
would be the recommended way to do it.

Now *that* is a pleasure to read, once you wrap your head around reverse
Polish notation and the concept of a stack. Which you need in your DSL
anyway, to understand push and pop.

I don't see why. Push and pop are not needed. They're just handfulmainly to modify a flow, collect a result, and go back to how the flowwas before the push.

It has nothing to do with RPN (which RPL is based on).

You say that this is an "easier way to get the same result":

[1,2,3,4] - [id, by(2)]

but it isn't, is it? The more complex example above ends up with two
streams joined in a single flow:

[1,2,3,4]|[2,4,6,8]

whereas the shorter version using the magic "id" gives you a single
stream containing nested streams:

[[1,2,3,4], [2,4,6,8]]


Says who?

Here are the rules again:
A flow can be transformed:
  [1,2] - f <=> [f(1),f(2)]
  ([1,2] | [3,4]) - f <=> [f(1,3),f(2,4)]
  ([1,2] | [3,4]) - [f] <=> [f(1),f(2)] | [f(3),f(4)]
  ([1,2] | [3,4]) - [f,g] <=> [f(1),f(2)] | [g(3),g(4)]
  [1,2] - [f,g] <=> [f(1),f(2)] | [g(1),g(2)]

Read the last line.

What's very interesting, is that [f,g] is an iterable as well, so yourfunctions can be generated as needed.

So, how could you make this more readable?

* Don't fight the reader's expectations. If they've programmed in Unix
shells, they expect | as the pipelining operator. If they haven't, they
probably will find>>  easy to read as a dataflow operator. Either way,
they're probably used to seeing a|b as meaning "or" (as in "this stream,
or this stream") rather than the way you seem to be using it ("this
stream, and this stream").

Here's my first attempt at improved syntax that doesn't fight the user:

[1,2,3,4]>>  push>>  by(2)>>  'double'>>  pop&  val('double')


There are problems with your syntax.
Mine:
[...]+[...] - f + [...] - g - h + [...] - i + [...]
Yours:
((([...]+[...] >> f) + [...] >> g >> h) + [...] >> i) + [...]
I first tried to use '<<' and '>>' but '+' and '-' are much better.

"push" and "pop" are poor choices of words. Push does not actually push
its input onto the stack, which would leave the input stream empty. It
makes a copy. You explain what they do:

Why should push move and not copy? In asm and openGL they copy, forinstance.

"Flows can be saved (push) and restored (pop)"

so why not just use SAVE and RESTORE as your functions? Or if they're too
verbose, STO and RCL, or my preference, store and recall.


Because that's not what they do.

push and pop actually push and pop, i.e. they can be nested and work asexpected.

[1,2,3,4]>>  store>>  by(2)>>  'double'>>  recall&  val('double')

I'm still not happy with&  for the join operator. I think that the use of
+ for concatenate and&  for join is just one of those arbitrary choices
that the user will have to learn. Although I'm tempted to try using a
colon instead.

[1,2,3]:[4,5,6]

would be a flow with two streams.

I can't see a way to overload ':' in Python. There are also technicallimitations.

I don't like the syntax for defining and using names. Here's a random
thought:

[1,2,3,4]>>  store>>  by(2)>>  @double>>  recall&  double

Use @name to store to a name, and the name alone to retrieve from it. But
I haven't given this too much thought, so it too might suck.


The problem, again, is Python limitation in defining DSLs.

At this point, one would have to interpret command-strings. I was tryingto avoid an interpreter on an interpreter.

Some other problems with your DSL:

A flow can be transformed:
    [1,2] - f<=>  [f(1),f(2)]


but that's not consistently true. For instance:

[1,2] - push<=/=>   [push(1), push(2)]

push is a special function (a keyword). It's clear what it does. It'sjust an exception to the general rule.

So the reader needs to know all the semantics of the particular function
f before being able to reason about the flow.


No, he only has to know special functions. Those are practically keywords.

Some functions are special and almost any function can be made special:
    [1,2,3,4,5] - filter(isprime)<=>  [2,3,5]
    [[],(1,2),[3,4,5]] - flatten<=>  [1,2,3,4,5]


You say that as if it were a good thing.

It is, because it's never implicit. For instance, isprime is a filter.flatten is a special builtin function (a keyword).


Kiuhnm
--
http://mail.python.org/mailman/listinfo/python-list

Re: Stream programming

Reply via email to