[sage-devel] Bookshelf of standard object behaviors and datastructures

Nicolas M. Thiery Thu, 10 Jul 2008 01:07:07 -0700

        Dear Sage developers,

Summary of this rather long e-mail which builds up on section "2.9
Mutability" of the programming guide.


 - In MuPAD, the memory behavior (and up to some point the data
   structure) of objects is essentially forced. This relieves the
   programmer of tedious memory management details. Luckily enough,
   this behavior turned out in practice to be appropriate for our (=
   MuPAD-Combinat people) purposes, at least most of the time. I could
   be biased by habits there, but can give some rationale.

 - Python being a general purpose language gives more freedom: the
   programmer of a class can choose, but has to implement, the memory
   behavior of the objects in that class.

 - I would like the best of both worlds in sage: that is to have a
   bookshelf of standard behaviors (and b.t.w. data structures) that
   the programmer of a class could choose from, without having to
   reimplement them.

 - I make below a first list of such behaviors (different communities
   use different terminology, so please tell me if I should call them
   differently). Most of them, I already discussed with Mike to check
   that they were implementable in python. Certainly 99% of the
   technical code is already there, either directly in sage or
   somewhere in the python world. So the question is about the exact
   specifications of each behavior, the guidelines to choose between
   behaviors, the choice among existing implementations, and the
   programmer's interface.


An example:

Patrick is a young mathematician without much programming
background. He wants to implement a class whose instances are integer
compositions, that is lists of positive integers, like [3,1,4]. First
he has to take certain decisions:

 - data structure: obvious here, a list (in the python sense)
 - element or parent: obvious here: element
 - memory behavior: i.e. what happens if one does:

       sage: c = Composition([3,1,4])
       sage: d = c
       sage: d[1] = 2
       sage: d
       [3,2,4]
       sage: c
       [3,?,4]

   Option 1: Compositions have a reference semantic: the last result
   is [3,2,4]. That is, upon the assignment d = c, d becomes a
   reference to the same object as c; modifying d also modifies
   c. That's the behavior of lists in python.

   Option 2: Compositions are immutable. The third command is just not
   valid. This is the suggested default behavior in sage.

   Option 3: Composition have copy semantic: the last result is
   [3,1,4]. Upon the assignment d = c, d becomes a copy of c, so
   modifying d does not modify c.

Following the sage-combinat guidelines, Patrick chooses copy semantic
(see below why). Now he picks up the already implemented data
structure and behavior from the bookshelf, and just has to implement
the mathematical methods associated to compositions. His code will
look like:

    class Composition ( list, Element, CopySemantic):

         def size(self):
             ...

         def conjugate(self):
             ...

Comments:

 - It is essential for us to make it that easy to implement new
   classes, and factor out once for all technicalities (immutability
   may be easy to implement, but not copy semantic).  Our contributers
   won't have the time, and often not the technical expertise for
   those. Besides, factoring out promotes concise code, and consistent
   behaviors.

 - I used an inheritance syntax above, but that's just for the sake of
   the example. All I really care about is that it should be easy to
   have Composition get its data structure and copy semantic for free
   directly from the bookshelf. The exact syntax and mechanism is open for
   discussions (typically inheriting directly from list and Element
   probably breaks currently, so something else needs to be done).


==============================================================================
More on "reference semantic" versus "immutability" versus "copy semantic".

To begin with, a followup on the example above:

    sage: def f(d):
    ...       d[1] = 2
    ...
    sage: f(c)
    sage: c
    [2,?,4]

In this example, there is an implicit assignment of c to the local
variable d of f. My belief is that what happens there should be
consistent with what happens upon a usual assignment; in other words,
in a call f(c), it is not the function f who should decide whether the
argument c is passed by copy or by reference, but the object c. Of
course, this displays my bias toward functional programming, where
results are always passed as return values, not through reference
effect on the arguments.

 - Reference semantic:
   Pros: avoid unnecessary copies; necessary in some specific
         algorithm to achieve optimal complexity

   Cons: non-locality: it is hard to keep track of all the ways an
         object you are manipulating in a given lexical scope can be
         modified: it might have been passed to another function which
         passed it to another which passed it to another defined in a
         very different lexical scope, and which could inadvertently
         modify it. This is error prone, and hard to debug (no
         const-safeness to help us).
   
   I guess that's why immutability is promoted in sage

 - Immutability:
   Pros: easy to implement, safe
   Cons (combinatorics context): most of the time, new objects are
         constructed from existing ones by small local modifications
         (think exchanging two adjacent entries in a permutation, or
         grafting a tree onto another one). Immutability imposes
         to reconstruct the new object from scratch, usually pushing
         the programmer to reveal the internal data structure
         (look e.g. at ishift in combinat/permutation.py).

 - Copy semantic:
   Pros: safety
         Does "what you expect"
         With appropriate copy-on-write, can achieve most of the time
         the same efficiency has for objects with reference semantic
          
   Cons: more complicated to implement; the copy-on-write part can be
         quite tricky, as we want to have instructions like:

              c[i] = 1
         or
              c = c.modifying_method()

         *not* to trigger a copy of c if there is no other reference
         to c elsewhere.
 
==============================================================================
Preliminary bookshelf of typical behaviors:

 - Unique representation:
        class UniqueRepresentation
            def __new__(...):
                does whatever is appropriate to ensure unique representation

            ... equality test using the unique id ...
 

        class MyParent (Parent, UniqueRepresentation):
            def __init__(...):


 - Copy semantic:
        see above

 - Object with cache:

        Goal: have a very short mantra to cache the results of
        complicated methods. For memory management, caching is best
        stored inside the object (it will pass out of scope at the
        same time as the object). The cache should be ignored in
        equality tests. Typically the object will be semantically
        immutable: caching a value does not change the object;
        otherwise, all cached value (like the hash value!) should be
        automatically cleared upon modification. We may want some
        parts of the cache to be ignored upon pickling as well.

        class MyGroup(Parent, ObjectWithCache):

             @remember("is_solvable")
             def is_solvable(self):
                 return ...
   
             @remember("__hash__")
             def __hash__(self):
                 return ...

             @remember("my_function", pickle=false)
             def my_function(self):
                 """
                 Return a function that is costly to build but can't be pickled
                 so we will just reconstruct it after pickling/unpickling
                 """
             ...

Typically, our Elements will have Copy Semantic, whereas our Parents
will be Semantically Immutable, have Unique Representation, and be
Objects With Cache (that's how it is in MuPAD, domains playing the
role of parents).

By the way, here are some typical data structures that should be on
the bookshelf as well:

 - vectors with entries indexed by any set (possibly infinite)
   (essentially done)
 - Sparse matrices with rows and columns indexed by any set
 - Lists, sets, multisets, which are Elements at the same time
 - Tableaux, trees, ...

Suggestions and comments extremely welcome! I plan to post in the
coming days one ticket per behavior, with a detailed test suite. Btw:
Adrien Boussicault (student in Marne-la-vallée, in CC) offered to
participate to the implementation once the specs are precise.

Cheers,
                                        Nicolas
-- 
Nicolas M. Thiery "Isil" <[EMAIL PROTECTED]>
http://Nicolas.Thiery.name/

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

[sage-devel] Bookshelf of standard object behaviors and datastructures

Reply via email to