On Fri, 23 Mar 2012 17:00:23 +0100, Kiuhnm wrote: > I've been writing a little library for handling streams as an excuse for > doing a little OOP with Python. > > I don't share some of the views on readability expressed on this ng. > Indeed, I believe that a piece of code may very well start as complete > gibberish and become a pleasure to read after some additional > information is provided. [...] > numbers - push - avrg - 'med' - pop - filter(lt('med'), ge('med'))\ > - ['same', 'same'] - streams(cat) - 'same' > > Ok, we're at the "complete gibberish" phase. > > Time to give you the "additional information".
There are multiple problems with your DSL. Having read your explanation, and subsequent posts, I think I understand the data model, but the syntax itself is not very good and far from readable. It is just too hard to reason about the code. Your syntax conflicts with established, far more common, use of the same syntax: you use - to mean "call a function" and | to join two or more streams into a flow. You also use () for calling functions, and the difference between - and () isn't clear. So a mystery there -- your DSL seems to have different function syntax, depending on... what? The semantics are unclear even after your examples. To understand your syntax, you give examples, but to understand the examples, the reader needs to understand the syntax. That suggests that the semantics are unclear even in your own mind, or at least too difficult to explain in simple examples. Take this example: > Flows can be saved (push) and restored (pop) : > [1,2,3,4] - push - by(2) - 'double' - pop | val('double') > <=> [1,2,3,4] | [2,4,6,8] What the hell does that mean? The reader initially doesn't know what *any* of push, by(2), pop or val('double') means. All they see is an obfuscated series of calls that starts with a stream as input, makes a copy of it, and doubles the entries in the copy: you make FIVE function calls to perform TWO conceptual operations. So the reader can't even map a function call to a result. With careful thought and further explanations from you, the reader (me) eventually gets a mental model here. Your DSL has a single input which is pipelined through a series of function calls by the - operator, plus a separate stack. (I initially thought that, like Forth, your DSL was stack based. But it isn't, is it?) It seems to me that the - operator is only needed as syntactic sugar to avoid using reverse Polish notation and an implicit stack. Instead of the Forth-like: [1,2,3,4] dup 2 * your DSL has an explicit stack, and an explicit - operator to call a function. Presumably "[1,2] push" would be a syntax error. I think this is a good example of an inferior syntax. Contrast your: [1,2,3,4] - push - by(2) - 'double' - pop | val('double') with the equivalent RPL: [1,2,3,4] dup 2 * Now *that* is a pleasure to read, once you wrap your head around reverse Polish notation and the concept of a stack. Which you need in your DSL anyway, to understand push and pop. You say that this is an "easier way to get the same result": [1,2,3,4] - [id, by(2)] but it isn't, is it? The more complex example above ends up with two streams joined in a single flow: [1,2,3,4]|[2,4,6,8] whereas the shorter version using the magic "id" gives you a single stream containing nested streams: [[1,2,3,4], [2,4,6,8]] So, how could you make this more readable? * Don't fight the reader's expectations. If they've programmed in Unix shells, they expect | as the pipelining operator. If they haven't, they probably will find >> easy to read as a dataflow operator. Either way, they're probably used to seeing a|b as meaning "or" (as in "this stream, or this stream") rather than the way you seem to be using it ("this stream, and this stream"). Here's my first attempt at improved syntax that doesn't fight the user: [1,2,3,4] >> push >> by(2) >> 'double' >> pop & val('double') "push" and "pop" are poor choices of words. Push does not actually push its input onto the stack, which would leave the input stream empty. It makes a copy. You explain what they do: "Flows can be saved (push) and restored (pop)" so why not just use SAVE and RESTORE as your functions? Or if they're too verbose, STO and RCL, or my preference, store and recall. [1,2,3,4] >> store >> by(2) >> 'double' >> recall & val('double') I'm still not happy with & for the join operator. I think that the use of + for concatenate and & for join is just one of those arbitrary choices that the user will have to learn. Although I'm tempted to try using a colon instead. [1,2,3]:[4,5,6] would be a flow with two streams. I don't like the syntax for defining and using names. Here's a random thought: [1,2,3,4] >> store >> by(2) >> @double >> recall & double Use @name to store to a name, and the name alone to retrieve from it. But I haven't given this too much thought, so it too might suck. Some other problems with your DSL: > A flow can be transformed: > [1,2] - f <=> [f(1),f(2)] but that's not consistently true. For instance: [1,2] - push <=/=> [push(1), push(2)] So the reader needs to know all the semantics of the particular function f before being able to reason about the flow. Your DSL displays magic behaviour, which is bad and makes it hard to read the code because the reader may not know which functions are magic and which are not. > Some functions are special and almost any function can be made special: > [1,2,3,4,5] - filter(isprime) <=> [2,3,5] > [[],(1,2),[3,4,5]] - flatten <=> [1,2,3,4,5] You say that as if it were a good thing. -- Steven -- http://mail.python.org/mailman/listinfo/python-list