Re: [Cython] A high-level discussion of the Cython internals

Robert Bradshaw Thu, 08 May 2008 11:02:17 -0700

On May 8, 2008, at 10:17 AM, Dag Sverre Seljebotn wrote:

> This is in response to this statement by Robert:
>
>
> --  
>
> I think this is good to talk about now. Basically, you want to change
> Cython to use the "visitor pattern" rather than the recursive pattern
> that it currently uses. This has pros and cons, but I think this
> could be a good thing, and you are certainly convinced that its
> needed to get started on the GSoC stuff.
>
> --
>
> So, leaving the patch and implementation strategies aside for now I'm
> going to attempt discussing this from the mountain top. (If a lot of
> this seems trivial, all the better -- it can then start to serve as
> documentation for a direction, rather than my "private"  
> efforts :-). If
> not, it will be good to discuss it.)
>
> I honestly believe that any time spent discussing and implementing  
> this
> now can repay itself many times later on in saved developer hours  
> and a
> lower learning curve for new developers. If people agree with me in  
> how
> important this is, I think it would be good to perhaps spend some time
> at the beginning of dev1 talking and discussing this (with some slides
> etc.).


+1

>
> Here we go:
>
> Assertion 1: Development of Cython has a potential for becoming  
> stagnant
> (if it hasn't already) quickly.

In the past 18 months or so since I started first hacking on the  
code, development has increased at an exponential rate, far from  
becoming stagnant. This does not mitigate your point though--we need  
to have this discussion to minimize brittle interdependence, lessen  
the learning curve, and take things to the next level.

> My (subjective) felling is that the codebase has a somewhat high
> learning curve. If one wants to add a fundamentally new features (as
> opposed to quick fixes, which do currently happen very quickly -- but
> think type inference!) it looks like one has to be very careful to  
> keep
> many different effects of one's changes in the head at the same time.
>
>
> Assertion 2: The reason for this state can at least in some part be
> attributed to some confusion as to what the current code tree *is*.
>
> I believe that the reason that the codebase can be practical in the
> current state is tightly connected with how similar C and CPython is
> with Python as a language. The design, so to speak, arose in an
> historical happy accident, because the input and output are almost 1:1
> related. A Python function and a C function loosely correspond, and so
> FuncDefNode can serve the role as both the parsed element and the
> element due for C serialization. However there is no reason to assume
> that the 1:1 correspondance holds everywhere; and I think there is a
> tendency that "ugly hacks" already tend to arise in situations  
> where the
> correspondance in program structure diverges more. (To consider how  
> this
> might become worse, think about a "method template node".)
>
> If this was only a theoretically founded critisism one might simply
> dismiss it, but I believe the problems in assertion 1 comes from this
> very fact, and that they can be solved by dealing with this conceptual
> problem.
>
>
> The proposed solution: Work towards "The ideal solution" below in  
> tiny,
> incremental steps. All *new* major features are implemented with the
> ideal solution in mind, and old code refactored as far as is needed  
> for
> that to happen, but one doesn't start rewriting working and bugfixed
> code just for the sake of abstract purity.
>
>
> The ideal solution: A strict pipeline-based approach where the "data"
> (one or more trees, graphs or similar) is in different stages, with
> different phases transforming one into the other. I.e, always try  
> to fit
> what one does into the following example scheme:
>
> data stage:string of code -> phase:parsing ->
> data stage:syntax tree -> phase:expand with statements ->
> data stage:tree without with statements -> phase:create scopes ->
> data stage:tree with scopes -> phase:analyse types ->
> data stage:tree with ".type" attributes -> ...
> ...
> data stage:tree closely related to *structure* of C source ->  
> phase:output
>
> Whenever something doesn't quit fit in the scheme of naive input/ 
> output,
> split it up into more phases until it does :-)
>
> (Digression: Closely related to C structure above means, for instance,
> no inner functions. No with statements. No untyped variables. That  
> kind
> of thing. Classes may be included though, since one can create a 1:1
> correspondance between C code using the CPython API and a class; so it
> is not about language "features" as such but some deeper structure.)
>
> Note that all phases does not need to be visitors! For instance, the
> output-to-C phase will stay a recursive process for the foreseeable
> future. This is a way of thinking, not a recipe for implementation.  
> (And
> indeed, this way of thinking is already followed to a degree -- but  
> not
> everywhere, and we get problems at the spots where it is not.)
>
> A very important side-effect of this is that it lends itself to
> "implementing complex Cython statements as more simple Cython
> statements". (Though ideally "more simple Cython statements" can  
> also be
> "intermediate simpler instructions" that bridges the gap between  
> Cython
> and C in a more fine-grained way in order to give more code reuse than
> there is today. But I digress.)
>
> (Implementation notes: This does not necesarrily mean that the classes
> of the nodes changes between phases; one does whatever is most
> convenient. For instance, if one has T2 = expand_with_nodes(T1),  
> then T1
> is allowed to keep WithStatementNode, T2 is however not, but otherwise
> the trees are the same. In the same way, if T1 = type_analysis(T0),  
> then
> ExprNodes in T0 is not allowed to have the type attribute, while
> ExprNodes in T1 are required to have them. This can be a documentation
> matter or enforced (in a number of ways), that issue is better left  
> for
> later.).
>
>
> Final words: This might seem trivial. It is not. The reason is that
> currently, almost wherever you try to implement two simple phases f  
> and
> g one ends up having the result of f depend on what is done in g for
> some nodes, and the result of g depend on f for other nodes. And so
> neither f(g(T)) nor g(f(T)) can be made to work.
>
> The point is, this way of changing how one thinks mostly affects the
> compiler design as such, more than it is a specific style of coding.
> Visitors happen to be the most natural way to implement this, but are
> not the main point. I'd be happy (or, happier than I am today) with a
> cleanly defined phase-based recursive process.

+1 to this whole idea. The visitor pattern certainly has the  
advantage that one can add phases without changing every node (which  
is perhaps why, e.g., the "analyze" phase does so many things that  
could be logically separated).

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] A high-level discussion of the Cython internals

Reply via email to