BACKGROUND  (skippable if you're a know-it-all)

Argument parsing for Python functions follows some very strict rules.
Unless the function implements its own parsing like so:

    def black_box(*args, **kwargs):

there are some semantics that are always true.  For example:

    * Any parameter that has a default value is optional, and vice-versa.

    * It doesn't matter whether you pass in a parameter by name or by
      position, it behaves the same.

    * You can see the default values by examining its inspect.Signature.

    * Calling a function and passing in the default value for a parameter
      is identical to calling the function without that parameter.

      e.g. (assuming foo is a pure function):

         def foo(a=value): ...

         foo() == foo(value) == foo(a=value)

      With that signature, foo() literally can't tell the difference
      between those three calls.  And it doesn't matter what the type
      of value is or where you got it.


Python builtins are a little less regular.  They effectively do their own
parsing.  So they *could* do any crazy thing they want.  99.9% of the time
they do one of four standard things:
  * They parse their arguments with a single call to PyArg_ParseTuple().
  * They parse their arguments with a single call to
    PyArg_ParseTupleAndKeywords().
  * They take a single argument of type "object" (METH_O).
  * They take no arguments (METH_NOARGS).

PyArg_ParseTupleAndKeywords() behaves almost exactly like a Python
function.  PyArg_ParseTuple() is a little less like a Python function,
because it doesn't support keyword arguments.  (Surely this behavior is
familiar to you!)

But then there's that funny 0.1%, the builtins that came up with their
own unique approach for parsing arguments--given them funny semantics.
Argument Clinic tries to accomodate these as best it can.   (That's why
it supports "optional groups" for example.)  But it can only do so much.


THE PROBLEM

Argument Clinic's original goal was to provide an introspection signature
for every builtin in Python.

But a small percentage of builtins have funny semantics that aren't
expressable in a valid Python signature.  This makes them hard to convert
to Argument Clinic, and makes their signature inaccurate.

If we want these functions to have an accurate Python introspection
signature, their argument parsing will have to change.


THE QUESTION

What should someone converting functions to Argument Clinic do
when faced with one of these functions?

Of course, the simplest answer is "nothing"--don't convert the
function to Argument Clinic.   We're in beta, and any change
that isn't a bugfix is impermissible.  We can try again for 3.5.

But if "any change" is impermissible, then we wouldn't have the
community support to convert to Argument Clinic right now.  The
community wants proper signatures for builtins badly enough that
we're doing it now, even though we're already in beta for Python
3.4.  Converting to Argument Clinic is, in the vast majority of
cases, a straightforward and low-risk change--but it is *a*
change.

Therefore perhaps the answer isn't an automatic "no".  Perhaps
additional straightforward, low-risk changes are permissible.  The
trick is, what constitutes a straightforward, low-risk change?
Where should we draw the line?  Let's discuss it.  Perhaps a
consensus will form around an answer besides a flat "no".


THE SPECIFICS

I'm sorting the problems we see into four rough categories.

a) Functions where there's a static Python value that behaves
   identically to not passing in that parameter (aka "the NULL problem")

   Example:
_sha1.sha1(). Its optional parameter has a default value in C of NULL. We can't express NULL in a Python signature. However, it just so happens
     that _sha1.sha1(b'') is exactly equivalent to _sha1.sha1(). b'' makes
     for a fine replacement default value.

     Same holds for list.__init__().  its optional "sequence" parameter has
     a default value in C of NULL.  But this signature:
        list.__init__(sequence=())
     works fine.

The way Clinic works, we can actually still use the NULL as the default
     value in C.  Clinic will let you use completely different values as
     the published default value in Python and the real default value in C.
     (Consenting adults rule and all that.)  So we could lie to Python and
     everything works just the way we want it to.

   Possible Solutions:
     0) Do nothing, don't convert the function.
     1) Use that clever static value as the default.


b) Functions where there's no static Python value that behaves identically to
   not passing in that parameter (aka "the dynamic default problem")

   There are functions with parameters whose defaults are mildly dynamic,
   responding to other parameters.

   Example:
     I forget its name, but someone recently showed me a builtin that took
     a list as its first parameter, and its optional second parameter
     defaulted to the length of the list.  As I recall this function didn't
     allow negative numbers, so -1 wasn't a good fit.

   Possible solutions:
     0) Do nothing, don't convert the function.
     1) Use a magic value as None.  Preferably of the same type as the
        function accepts, but failing that use None.  If they pass in
        the magic value use the previous default value.  Guido himself
        suggested this in
     2) Use an Argument Clinic "optional group".  This only works for
        functions that don't support keyword arguments.  Also, I hate
        this, because "optional groups" are not expressable in Python
        syntax, so these functions automatically have invalid signatures.


c) Functions that accept an 'int' when they mean 'boolean' (aka the
   "ints instead of bools" problem)

   This is specific but surprisingly common.

   Before Python 3.3 there was no PyArg_ParseTuple format unit that meant
   "boolean value".  Functions generally used "i" (int).  Even older
   functions accepted an object and called PyLong_AsLong() on it.
   Passing in True or False for "i" (or PyLong_AsLong()) works, because
   boolean inherits from long.   But anything other than ints and bools
   throws an exception.

   In Python 3.3 I added the "p" format unit for boolean arguments.
   This calls PyObject_IsTrue() which accepts nearly any Python value.

   I assert that Python has a crystal clear definition of what
   constitutes "true" and "false".  These parameters are clearly
   intended as booleans but they don't conform to the boolean
   protocol.  So I suggest every instance of this is a (very mild!)
   bug.  But changing these parameters to use "p" is a change: they'll
   accept many more values than before.

   Right now people convert these using 'int' because that's an exact
   match.  But sometimes they are optional, and the person doing the
   conversion wants to use True or False as a default value, and it
   doesn't work: Argument Clinic's type enforcement complains and
   they have to work around it.  (Argument Clinic has to enforce some
   type-safety here because the values are used as defaults for C
   variables.)  I've been asked to allow True and False as defaults
   for "int" parameters specifically because of this.

   Example:
     str.splitlines(keepends)

   Solution:
     1) Use "bool".
     2) Use "int", and I'll go relax Argument Clinic so they
        can use bool values as defaults for int parameters.

d) Functions with behavior that deliberately defy being expressed as a
   Python signature (aka the "untranslatable signature" problem)

   Example:
     itertools.repeat(), which behaves differently depending on whether
     "times" is supplied as a positional or keyword argument.  (If
     "times" is <0, and was supplied via position, the function yields
     0 times. If "times" is <0, and was supplied via keyword, the
     function yields infinitely-many times.)

   Solution:
     0) Do nothing, don't convert the function.
     1) Change the signature until it is Python compatible.  This new
        signature *must* accept a superset of the arguments accepted
        by the existing signature.  (This is being discussed right
        now in issue #19145.)


//arry/
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to