New submission from Larry Hastings: Sorry this is so long--but I wanted to make my point. Here's the tl;dr summary.
The problem: The syntax used for Argument-Clinic-generated text signatures for builtins means CPython mistakenly identifies hand-written, unparsable pseudo-signatures as legitimate signatures. This causes real, non-hypothetical problems. I think we should change the syntax to something people would never write by accident. Here are some suggestions: "*(" "*clinic*(" "\01 clinic(" -- A quick recap on how signature information for builtins works. The builtin's docstring contains the signature, encoded as text using a special syntax on the first line. CPython callables always have getters for their __doc__ member; the doc getter function examines the first line, and if it detects a signature it skips past it and returns the rest. CPython's new getter on callables __text_signature__ also look at the internal docstring. If it detects a signature it returns it, otherwise it returns None. inspect.signature then retrieves __text_signature__, and if ast.parse() parses it, it populates the appropriate Signature and returns that. And then pydoc uses the Signature object to print the first line of help(). In #19674 there was some discussion on what this syntax should be. Guido suggested they look like this: functionname(args, etc)\n He felt it was a good choice, and pointed out that Sphinx autodoc uses this syntax. (Not because using this syntax would help Sphinx--it won't. Just as a "here's how someone else solved the problem" data point.) __doc__ and __text_signature_ aren't very smart about detecting signatures. Here's their test in pseudo-code: if the first N bytes match the name of the function, and the N+1th byte is a left parenthesis, then it's assumed to be a valid signature. -- First, consider: this signature syntax is the convention docstrings already use. Nearly every builtin callable in Python has a hand-written docstring that starts with "functionname(". Great!, you might think, we get signatures for free, even on functions that haven't been converted to Argument Clinic! The problem is, many of these pseudo-signatures aren't proper Python. Consider the first line of the docstring for os.lstat(): "lstat(path, *, dir_fd=None) -> stat result\n" This line passes the "is it a text signature test?", so __doc__ skips past it and __text_signature__ returns it. But it isn't valid actually valid. ast.parse() rejects it, so inspect.signature returns nothing. pydoc doesn't get a valid signature, so it prints "lstat(...)", and the user is deprived of the helpful line handwritten by lstat's author. That's bad enough. Now consider the first *two* lines of the docstring for builtin open(): "open(file, mode='r', buffering=-1, encoding=None,\n" " errors=None, newline=None, closefd=True, opener=None) -> file object\n" __doc__ clips the first line but retains the second. pydoc prints "open(...)", followed by the second line! Now we have the problem reported in #20075: "help(open) eats first line". Both of these problems go away if I add one more check to the signature-detecting code: does the line end with ')'? But that's only a band-aid on the problem. Consider socket.accept's docstring: "_accept() -> (integer, address info)\n" Okay, so __doc__ and __text_signature__ could count parentheses and require them to balance. But then they'd have to handle strings that contain parentheses, which means they'd also have to understand string quoting. And there would *still* be handwritten docstrings that would pass that test but wouldn't parse properly. Consider bisect.insort_right: "insort_right(a, x[, lo[, hi]])\n" We could only be *certain* if we gave up on having two parsers. Write the signature-recognizer code only once, in C, then call that in __doc__ and __text_signature__ and inspect.signature(). But that seems unreasonable. Okay, so we could attack the problem from the other end. Clean up all the docstrings in CPython, either by converting to Argument Clinic or just fixing them by hand. But that means that *third-party modules* will still have the mysterious problem. Therefore I strongly suggest we switch to a syntax that nobody will ever use by accident. Have I convinced you? ---------- assignee: larry messages: 208634 nosy: barry, brett.cannon, gennad, gvanrossum, larry, ncoghlan, skrah, zach.ware priority: normal severity: normal stage: needs patch status: open title: Argument Clinic should use a non-error-prone syntax to mark text signatures type: behavior versions: Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue20326> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com