Requesting comments on the following pre-PEP. pybench runs both with and without the patch applied would also be appreciated.
- Chris R

Title: Default Argument Expressions
Author: Christopher Rebert <cvrebertatgmaildotcom>
Status: Draft
Type: Standards Track
Requires: 3000
Python-Version: 3.0

Abstract

    This PEP proposes new semantics for default arguments to remove
    boilerplate code associated with non-constant default argument values,
allowing them to be expressed more clearly and succinctly. Specifically, all default argument expressions are re-evaluated at each call as opposed
    to just once at definition-time as they are now.


Motivation

    Currently, to write functions using non-constant default arguments, one
    must use the idiom:

        def foo(non_const=None):
            if non_const is None:
                non_const = some_expr
            #rest of function

or equivalent code. Naive programmers desiring mutable default arguments
   often make the mistake of writing the following:

        def foo(mutable=some_expr_producing_mutable):
            #rest of function

However, this does not work as intended, as 'some_expr_producing_mutable' is evaluated only *once* at definition-time, rather than once per call at
    call-time.  This results in all calls to 'foo' using the same default
value, which can result in unintended consequences. This necessitates the previously mentioned idiom. This unintuitive behavior is such a frequent
    stumbling block for newbies that it is present in at least 3 lists of
    Python's deficiencies [0] [1] [2].  Python's tutorial even mentions the
    issue explicitly [3].
There are currently few, if any, known good uses of the current behavior of mutable default arguments. The most common one is to preserve function
    state between calls.  However, as one of the lists [2] comments, this
    purpose is much better served by decorators, classes, or (though less
    preferred) global variables.
    Therefore, since the current semantics aren't useful for non-constant
default values and an idiom is necessary to work around this deficiency, why not change the semantics so that people can write what they mean more directly, without the tedious boilerplate? Removing this idiom would help
    make code more readable and self-documenting.


Rationale

    The discussion referenced herein is based on two threads [4] [5] on the
    python-ideas mailing list.
    Originally, it was proposed that all default argument values be
    deep-copied from the original (evaluated at definition-time) at each
invocation of the function where the default value was required. However, this doesn't take into account default values that are not literals, e.g. function calls, subscripts, attribute accesses. Thus, the new idea was to
    re-evaluate the default arguments at each call where they were needed.
There was some concern over the possible performance hit this could cause, and whether there should be new syntax so that code could use the existing
    semantics for performance reasons.  Some of the proposed syntaxes were:

        def foo(bar=<baz>):
            #code

        def foo(bar=new baz):
            #code

        def foo(bar=fresh baz):
            #code

        def foo(bar=separate baz):
            #code

        def foo(bar=another baz):
            #code

        def foo(bar=unique baz):
            #code

        def foo(bar or baz):
            #code

    where the keyword (or angle brackets) would indicate that the
    default value 'baz' of parameter 'bar' should use the new semantics.
    Other parameters would continue to use the old semantics.

    Alternately, the new semantics could be the default, with the old
    semantics accessible using:

        def foo(bar=once baz):
            #code

Where 'once' indicates the old default argument semantics. A similar idea is mentioned in PEP 3103 [6] under "Option 4". However, having two sets of semantics could be confusing, and leaving in the old semantics might be considered premature optimization. So this PEP proposed having just one set of semantics. Refactorings to deal with the possible performance hit
    from the new semantics are discussed later.

    A more radical proposed solution was to restrict default arguments to
being hash()-able values, thus theoretically restricting default arguments
    to immutable values only.  While this would solve the newbie-confusion
    issue, it does not suggest a better way to specify that a default value
    should be recomputed at every function call.

Throughout the discussion, several decorators were shown as alternatives
    to the aforementioned idiom.  These do allow the programmer to express
their intent more clearly, at the cost of some extra complexity. Also, no
    one generator could be applied to all situations.  The programmer would
    have to figure out which one to use each time.  This PEP's proposed
solution would make these decorators unnecessary and allow a more general solution to the issue than these decorators. The question was also raised as to whether the problem this PEP seeks to solve is significant enough to
    warrant a language change.  The statistics in the Compatibility Issues
section should help demonstrate the necessity of the changes that this PEP
    proposes.

The next question was exactly how default variable expressions should be
    scoped.  By way of demonstration:

        a = 42
        def foo(b=a):
            a = 3.14

Now, does the variable 'a' in the default expression for 'b' refer to the lexical variable 'a', or the local variable 'a'? If it refers to a local
    variable, then this code is basically equivalent to:

        a = 42
        def foo(b=None):
            if b is None:
                b = a
            a = 3.14

in which case, 'a' is being referenced before it's been assigned to in the function, causing an UnboundLocalError. The alternative is to have Python
    treat 'a' within the function's body differently from the 'a' in the\
    default expression.  In this case, the code would behave as if it were:

        a = 42
        def foo(b=None):
            if b is None:
                b = __a
            a = 3.14

where __a indicates Python 'magically' treating it as a lexical variable
    that is distinct from the local variable 'a'.  This would increase
    backward-compatibility, allowing you to use a lexical variable with the
    same name as a local variable as a default expression, which is more
similar to Python's current behavior. However, this would complicate the
    semantics of default expressions.  For simplicity's sake, this PEP
    endorses treating variables in default expressions as normal function
variables. Suggestions for dealing with the incompatibilities this would
    introduce are discussed later.


Specification

The current semantics for default arguments are replaced by the following
    semantics:
        - Whenever a function is called, and the caller does not provide a
        value for a parameter with a default expression, the parameter's
        default expression is evaluated in the function's scope.  The
        resulting value is then assigned to a local variable in the
        function's scope with the same name as the parameter.
        - The default argument expressions are evaluated before the body
        of the function.
        - The evaluation of default argument expressions proceeds in the
same order as that of the parameter list in the function's definition.
        - Variables in a default expression are be treated like normal
function variables (i.e. global/lexical variables unless assigned to
        in the function).
    Given these semantics, it makes more sense to refer to default argument
    expressions rather than default argument values, as the expression is
    re-evaluated at each call, rather than just once at definition-time.
    Therefore, we shall do so hereafter.

    Demonstrative examples:
        #default argument expressions can refer to
        #variables in the enclosing scope...
        CONST = "hi"
        def foo(a=CONST):
            print a

        >>> foo()
        hi
        >>> CONST="bye"
        >>> foo()
        bye

        #...or even other arguments
        def ncopies(container, n=len(container)):
            return [container for i in range(n)]

        >>> ncopies([1, 2], 5)
        [[1, 2], [1, 2], [1, 2], [1, 2], [1, 2]]
        >>> ncopies([1, 2, 3])
        [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
        >>> #ncopies grabbed n from [1, 2, 3]'s length (3)

        #default argument expressions are arbitrary expressions
        def my_sum(lst):
            cur_sum = lst[0]
            for i in lst[1:]: cur_sum += i
            return cur_sum

        def bar(b=my_sum((["b"] * (2 * 3))[:4])):
            print b

        >>> bar()
        bbbb

        #default argument expressions are re-evaluated at every call...
        from random import randint
        def baz(c=randint(1,3)):
            print c

        >>> baz()
        2
        >>> baz()
        3

        #...but only when they're required
        def silly():
            print "spam"
            return 42

        def qux(d=silly()):
            pass

        >>> qux()
        spam
        >>> qux(17)
        >>> qux(d=17)
        >>> qux(*[17])
        >>> qux(**{'d':17})
        >>> #no output since silly() never called
        >>> #because d's value was specified in the calls

#default argument expressions are evaluated in calling sequence order
        count = 0
        def next():
            global count
            count += 1
            return count - 1

        def frobnicate(g=next(), h=next(), i=next()):
            print g, h, i

        >>> frobnicate()
        0 1 2
        >>> #g, h, and i's default argument expressions are evaluated
        >>> #in the same order as in the parameter definition

#variables in default expressions refer to lexical/global variables...
        j = "holy grail"
        def frenchy(k=j):
            print j
        #...unless assigned to in the function (or its parameters)
        def arthur(j="swallow", m=j):
            print m

        >>> frenchy()
        holy grail
        >>> arthur()
        swallow


Compatibility Issues

This change in semantics breaks code which uses mutable default argument
    expressions and depends on those expressions being evaluated only once.
It also will break code that assigns new incompatible values in a parent
    scope to variables used in default expressions.  Code relying on such
    behavior can be refactored from:

        def foo(bar=mutable):
            #code

    to

        state = mutable
        def foo(bar=state):
            #code

    or

        class Baz(object):
            state = mutable

            @classmethod
            def foo(cls, bar=cls.state):
                #code

    or

        from functools import wraps

        def stateify(states):
            def _wrap(func):
                @wraps(func)
                def _wrapper(*args, **kwds):
                    new_kwargs = states.copy()
                    new_kwargs.update(kwds)
                    return func(*args, **new_kwargs)
                return _wrapper
            return _wrap

        @stateify({'bar' : mutable})
        def foo(bar):
            #code

    Code such as the following (which was also mentioned in the Rationale):

        b = 42 #outer b
        def foo(a=b): #ERROR: refers to local b, not outer b!
            b = 7 #local b

which has default values that refer to variables in enclosing scopes and
    contains assignments to local variables of the same names will also be
incompatible, as the 'b' in the default argument refers to the local 'b' rather than the outer 'b', resulting in an UnboundLocalError because the
    local variable 'b' has not been assigned to at the time "a"'s default
    expression is evaluated.  Such code will need to rename the affected
    variables.

    The changes in this PEP are backwards-compatible with all code whose
    default argument values are immutable, including code using the idiom
mentioned in the 'Motivation' section. However, such values will now be
    recomputed for each call for which they are required.  This may cause
    performance degradation.  If such recomputation is significantly
    expensive, the same refactoring mentioned above can be used.

    A survey of the standard library for Python v2.5, produced via a
    script [7], gave the following statistics for the standard library
    (608 files, test suites were excluded):

        total number of non-None immutable default arguments: 1585 (41.5%)
        total number of mutable default arguments: 186 (4.9%)
total number of default arguments with a value of None: 1813 (47.4%) total number of default arguments with unknown mutability: 238 (6.2%)
        total number of comparisons to None: 940

Note: The number of comparisons to None refers to *all* such comparisons, not necessarily just those used in the idiom mentioned in the Motivation
    section.

    Looking more closely at the script's output, it appears that Tix.py and
    Tkinter.py are the primary users of mutable default arguments in the
    standard library.

    Similarly, examination of the unknown default arguments reveals that a
significant fraction are functions, classes, or constants, which should, for
    the most part, not be functionally affected by this proposal

Assuming the standard library is indicative of Python code in general, the change in semantics will have comparatively little impact on the correct
    operation of Python programs.

Running pybench with modifications to simulate the proposed semantics [8] shows that Python function/method calls using default arguments run about 4.4%-6.5% slower versus the current semantics. However, as the simulation
    of the proposed semantics is crude, this should be considered an upper
    bound for any performance decreases this proposal might cause.

In relation to Python 3.0, this PEP's proposal is compatible with those of
    PEP 3102 [9] and PEP 3107 [10], though it does not depend on the
    acceptance of either of those PEPs.


Reference Implementation

    All code of the form:

        def foo(bar=some_expr, baz=other_expr):
            #body

    Should be compiled as if it had read (in pseudo-Python):

        def foo(bar=_undefined, baz=_undefined):
            if bar is _undefined:
                bar = some_expr
            if baz is _undefined:
                baz = other_expr
            #body

    where '_undefined' is the value given to a parameter when the caller
    didn't specify a value for it.  This is not intended to be a literal
    translation, but rather a demonstration as to how Python's
argument-handling machinery should act. Specifically, there should be no
    Python-level value corresponding to _undefined, nor should a literal
    translation such as that shown necessarily be used.


References

    [0] 10 Python pitfalls
        http://zephyrfalcon.org/labs/python_pitfalls.html

    [1] Python Gotchas
        http://www.ferg.org/projects/python_gotchas.html#contents_item_6

    [2] When Pythons Attack

http://www.onlamp.com/pub/a/python/2004/02/05/learn_python.html?page=2

    [3] 4. More Control Flow Tools
        http://docs.python.org/tut/node6.html#SECTION006710000000000000000

    [4] [Python-ideas] fixing mutable default argument values

http://mail.python.org/pipermail/python-ideas/2007-January/000073.html

    [5] [Python-ideas] proto-PEP: Fixing Non-constant Default Arguments

http://mail.python.org/pipermail/python-ideas/2007-January/000121.html

    [6] A Switch/Case Statement
        http://www.python.org/dev/peps/pep-3103/

    [7] Script to generate default argument statistics
        See attachment.

    [8] Patch to pybench/Calls.py
        See attachment.

    [9] Keyword-Only Arguments
        http://www.python.org/dev/peps/pep-3102/

    [10] Function Annotations
        http://www.python.org/dev/peps/pep-3107/


Copyright

    This document has been placed in the public domain.
23c23,26
<         def h(a,b,c,d=1,e=2,f=3):
---
>         def h(a,b,c,d=None,e=None,f=None):
>             if d is None: d=1
>             if e is None: e=2
>             if f is None: f=3
103c106,109
<         def h(a,b,c,d=1,e=2,f=3):
---
>         def h(a,b,c,d=None,e=None,f=None):
>             if d is None: d=1
>             if e is None: e=2
>             if f is None: f=3
256c262
<             def k(self,a,b,c=3):
---
>             def k(self,a,b,c=None):
257a264
>                 if c is None: c=3
362c369
<             def k(self,a,b,c=3):
---
>             def k(self,a,b,c=None):
363a371
>                 if c is None: c=3
434c442,445
< def h(a,b,c,d=1,e=2,f=3):
---
> def h(a,b,c,d=None,e=None,f=None):
>     if d is None: d=1
>     if e is None: e=2
>     if f is None: f=3
#!/usr/bin/env python
from __future__ import division
from glob import glob
import re

LIBS = '/home/chris/Python-2.5/Lib/' #location of Python std lib
#filter out test suites
files = [f for f in set(glob(LIBS+'*.py') + glob(LIBS+"*/*.py")) if 'test_' not in f and '_tests' not in f]


class Argument(object):
    def __init__(self, filename, funcname, string):
        self.file = filename
        self.func = func
        self.string = string
    
    def display(self):
        print self.func, 'in', self.file+':', self.string

defargpat = r"[_A-Za-z]\w* *= *"
class ArgKind(object):
    kinds = []
    def __init__(self, name, track, *patterns):
        if track:
            self.args = []
        else:
            self._total = 0
        self.track = track
        self.name = name
        self.patterns = [re.compile(defargpat + pat) for pat in patterns]
        self.__class__.kinds.append(self)
    
    def process(self, filename, funcname, args):
        for pat in self.patterns:
            args = self._process(pat, filename, funcname, args)
        return args
    
    def _process(self, pattern, filename, funcname, args):
        #count + remove our args
        for match in re.finditer(pattern, args):
            match = match.group()
            if self.track:
                self.args.append(Argument(filename, funcname, match))
            else:
                self._total += 1
            args = args.replace(match, '')
        #squeeze spaces + commas
        args = re.sub(" {2,}", " ", args)
        args = re.sub("(, ?){2,}", ",", args)
        #remove leading commas + spaces
        i = 0
        for i, char in enumerate(args):
            if char not in (',', ' '):
                break
        args = args[i:]
        return args
    
    def _getTotal(self):
        if self.track:
            return len(self.args)
        else:
            return self._total
    total = property(_getTotal)
    
    def display(self):
        print "==="+self.name, "args==="
        if self.track:
            for arg in self.args:
                arg.display()
        print "total:", self.total, "("+str(round(100*(self.total / ArgKind.overall_total()), 1))+"%)"
        print
    
    @classmethod
    def overall_total(cls):
        return sum(kind.total for kind in cls.kinds)


nonecmp = re.compile(r"\w+ *((==)|(is)) *None")

none = ArgKind("none", False, "None")

nums = r"(-)?\d+(\.\d+)?"
bools = "((True)|(False))"
dqstrs = '"([^"]|(\\"))*"'
sqstrs = "'([^']|(\\'))*'"
mods = r"((sys)|(os))\.(\w|\.)+"
tups = r"\(.*?\)"
builtins = "((int)|(float)|(cmp)|(getattr)|(abs)|(filter)|(object)|(sum))"
const = ArgKind("const", False, nums, bools, dqstrs, sqstrs, mods, tups, builtins)

dicts = r"\{.*?\}"
lists = r"\[.*?\]"
mutable = ArgKind("mutable", True, dicts, lists)

spaces = re.compile("^ +$")
class UnknownArgs(ArgKind):
    def __init__(self):
        ArgKind.__init__(self, "unknown", True)
        
    def process(self, filename, funcname, args):
        if '=' in args:
            for arg in args.split(","):
                #filter out empty + spaces-only 'args'
                if arg and not spaces.match(arg):
                    self.args.append(Argument(filename, funcname, arg))
        return ""
unknown = UnknownArgs()

none_cmps = 0

def clean(string):
    #remove spaces
    args = re.sub("[ \t]+", " ", string.replace('\n', ''))
    args = args.split(',')
    #remove required args
    for i, arg in enumerate(args):
        if '=' in arg: break
    args = args[i:]
    return ','.join(args)


defpat = re.compile(r"def +(\w+) *\((.*?)\) *:", re.S)
for i, filepath in enumerate(files):
    f = open(filepath)
    content = f.read()
    f.close()
    filename = filepath.split('/')[-1]
    
    for adef in re.finditer(defpat, content):
        func = adef.group(1)
        args = clean(adef.group(2))
        
        if '=' in args:#has suspect args
            for arg_kind in [none, const, mutable, unknown]:
                args = arg_kind.process(filename, func, args)
    
    none_cmps += len(re.findall(nonecmp, content))

print 'In', len(files), 'files:'
print
for arg_kind in [none, const, mutable, unknown]:
    arg_kind.display()
print 'Also, comparisons to None:', none_cmps
_______________________________________________
Python-3000 mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-3000
Unsubscribe: 
http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com

Reply via email to