Re: [Python-ideas] Why is design-by-contracts not widely adopted?

Marko Ristin-Kaufmann Mon, 24 Sep 2018 00:47:17 -0700

Hi,

Thank you for your replies, Hugh and David! Please let me address the
points in serial.

*Obvious benefits*
You both seem to misconceive the contracts. The goal of the
design-by-contract is not reduced to testing the correctness of the code,
as I reiterated already a couple of times in the previous thread. The
contracts document *formally* what the caller and the callee expect and
need to satisfy when using a method, a function or a class. This is meant
for a module that is used by multiple people which are not necessarily
familiar with the code. They are *not *a niche. There are 150K projects on
pypi.org. Each one of them would benefit if annotated with the contracts.

Please do re-read my previous messages on the topic a bit more attentively.
These two pages I also found informative and they are quite fast to read
(<15 min):
https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-by-contract/index.html
https://gleichmann.wordpress.com/2007/12/09/test-driven-development-and-design-by-contract-friend-or-foe/

Here is a quick summary of the argument.

When you are documenting a method you have the following options:
1) Write preconditions and postconditions formally and include them
automatically in the documentation (*e.g., *by using icontract library).
2) Write precondtions and postconditions in docstring of the method as
human text.
3) Write doctests in the docstring of the method.
4) Expect the user to read the actual implementation.
5) Expect the user to read the testing code.

Here is what seems obvious to me. *Please do point me to what is not
obvious to you* because that is the piece of puzzle that I am missing (*i.e.
*why this is not obvious and what are the intricacies). I enumerated the
statements for easier reference:

a) Using 1) is the only option when you have to deal with inheritance.
Other approaches can no do that *without much repetition *in practice
*.*

b) If you write contracts in text, they will become stale over time
(*i.e. *disconnected
from the implementation and *plain wrong and misleading*). It is a common
problem that the comments rot over time and I hope it does not need further
argument (please let me know if you really think that the comments *do not
rot*).

c) Using 3), doctests, means that you need mocking as soon as your method
depends on non-trivial data structures. Moreover, if the output of the
function is not trivial and/or long, you actually need to write the
contract (as postcondition) just *after *the call in the doctest.
Documenting preconditions includes writing down the error that will be
thrown. Additionally, you need to write that what you are documenting
actually also holds for all other examples, not just for this particular
test case (*e.g.*, in human text as a separate sentence before/after the
doctest in the docstring).

d) Reading other people's code in 4) and 5) is not trivial in most cases
and requires a lot of attention as soon as the method includes calls to
submethods and functions. This is impractical in most situation*s *since *most
code is non-trivial to read* and is subject to frequent changes.

e) Most of the time, 5) is not even a viable option as the testing code is
not even shipped with the package and includes going to github (if the
package is open-sourced) and searching through the directory tree to find
the test. This forces every user of a library to get familiar with the *testing
code *of the library.

f) 4) and 5) are *obviously* a waste of time for the user -- please do
explain why this might not be the case. Whenever I use the library, I do
not expect to be forced to look into its test code and its implementation.
I expect to read the documentation and just use the library if I'm
confident about its quality. I have rarely read the implementation of the
standard libraries (notable exceptions in my case are ast and subprocess
module) or any well-established third-party library I frequently use
(numpy, opencv, sklearn, nltk, zmq, lmdb, sqlalchemy). If the documentation
is not clear about the contracts, I use trial-and-error to figure out the
contracts myself. This again is *obviously* a waste of time of the *user *and
it's far easier to read the contracts directly than use trial-and-error
*.*

*Contracts are difficult to read.*
David wrote:

> To me, as I've said, DbC imposes a very large cost for both writers and
> readers of code.
>

This is again something that eludes me and I would be really thankful if
you could clarify. Please consider for an example, pypackagery (
https://pypackagery.readthedocs.io/en/latest/packagery.html) and the
documentation of its function resolve_initial_paths:
packagery.resolve_initial_paths(*initial_paths*)

Resolve the initial paths of the dependency graph by recursively adding *.py
files beneath given directories.
Parameters:

*initial_paths* (List[Path]) – initial paths as absolute paths
Return type:

List[Path]
Returns:

list of initial files (*i.e.* no directories)
Requires:

   - all(pth.is_absolute() for pth in initial_paths)

Ensures:

   - len(result) >= len(initial_paths) if initial_paths else result == []
   - all(pth.is_absolute() for pth in result)
   - all(pth.is_file() for pth in result)

How is this difficult to read, unless the reader is not familiar with
formalism and has a hard time parsing the quantifiers and logic rules? Mind
that all these bits are deemed *important* by the writer -- and need to be
included in the function description  *somehow* -- you can choose between
1)-5). 1) seems obviously best to me. 1) will be tested at least at test
time*. *If I have a bug in the implementation (*e.g., *I include a
directory in the result), the testing framework will notify me again.

Here is what the reader would have needed to read without the formalism in
the docstring as text (*i.e., *2):
* All input paths must be absolute.
* If the initial paths are empty, the result is an empty list.
* All the paths in the result are also absolute.
* The resulting paths only include files.

and here is an example with doctest (3):

>>> result = packagery.resolve_initial_paths([])
[]

>>> with temppathlib.NamedTemporaryFile() as tmp1, \
...         temppathlib.NamedTemporaryFile() as tmp2:
...     tmp1.path.write_text("some text")
...     tmp2.path.write_text("another text")
...     result = packagery.resolve_initial_paths([tmp1, tmp2])
...     assert all(pth.is_absolute() for pth in result)
...     assert all(pth.is_file() for pth in result)

>>> with temppathlib.TemporaryDirectory() as tmp:
...     packagery.resolve_initial_paths([tmp.path])
Traceback (most recent call last):
        ...
ValueError("Unexpected directory in the paths")

>>> with temppathlib.TemporaryDirectory() as tmp:
...     pth = tmp.path / "some-file.py"
...     pth.write_text("some text")
...     packagery.initial_paths([pth.relative_to(tmp.path)])
Traceback (most recent call last):
        ...
ValueError("Unexpected relative path in the initial paths")

Now, how can reading the text (2, code rot) or reading the doctests (3,
longer, includes contracts) be easier and more maintainable compared to
reading the contracts? I would be really thankful for the explanation -- I
feel really stupid as for me this is totally obvious and, evidently, for
other people it is not.

I hope we all agree that the arguments about this example
(resolve_initial_paths) selected here are not particular to pypackagery,
but that they generalize to most of the functions and methods out there.

*Writing contracts is difficult.*
David wrote:

> To me, as I've said, DbC imposes a very large cost for both writers and
> readers of code.
>

The effort of writing contracts include as of now:
* include icontract (or any other design-by-contract library) to setup.py
(or requirements.txt), one line one-off
* include sphinx-icontract to docs/source/conf.py and
docs/source/requirements.txt, two lines, one-off
* write your contracts (usually one line per contract).

The contracts (1) in the above-mentioned function look like this (omitting
the contracts run only at test time):

@icontract.pre(lambda initial_paths: all(pth.is_absolute() for pth in
initial_paths))
@icontract.post(lambda result: all(pth.is_file() for pth in result))
@icontract.post(lambda result: all(pth.is_absolute() for pth in result))
@icontract.post(lambda initial_paths, result: len(result) >=
len(initial_paths) if initial_paths else result == [])
def resolve_initial_paths(initial_paths: List[pathlib.Path]) ->
List[pathlib.Path]:
    ...

Putting aside how this code could be made more succinct (use "args" or "a"
argument in the condition to represent the arguments, using from ... import
..., renaming "result" argument to "r", introducing a shortcut methods
slowpre and slowpost to encapsulate the slow contracts not to be executed
in the production), how is this difficult to write? It's 4 lines of code.

Writing text (2) is 4 lines. Writing doctests (3) is 23 lines and includes
the contracts. Again, given that the writer is trained in writing formal
expressions, the mental effort is the same for writing the text and writing
the formal contract (in cases of non-native English speakers, I'd even
argue that formal expressions are sometimes *easier* to write).

*99% vs 1%*

> I'm not alone in this. A large majority of folks formally educated in
> computer science and related fields have been aware of DbC for decades but
> deliberately decided not to use them in their own code. Maybe you and
> Bertram Meyer are simple better than that 99% of programmers... Or maybe
> the benefit is not so self-evidently and compelling as it feels to you.

I think that ignorance plays a major role here. Many people have
misconceptions about the design-by-contract. They just use 2) for more
complex methods, or 3) for rather trivial methods. They are not aware that
it's easy to use the contracts (1) and fear using them for non-rational
reasons (*e.g., *habits).

This is also what Todd Plesel writes in
https://www.win.tue.nl/~wstomv/edu/2ip30/references/design-by-contract/index.html#IfDesignByContractIsSoGreat
:

The vast majority of those developing software - even that intended to be
reused - are simply ignorant of the concept. As a result they produce
application programmer interfaces (APIs) that are under-specified thus
passing the burden to the application programmer to discover by trial and
error, the 'acceptable boundaries' of the software interface (undocumented
contract's terms). But such ad-hoc operational definitions of software
interface discovered through reverse-engineering are subject to change upon
the next release and so offers no stable way to ensure software
correctness.

The fact that many people involved in writing software lack pertinent
education (e.g., CS/CE degrees) and training (professional courses, read
software engineering journals, attend conferences etc.) is *not* a reason
they don't know about DBC since the concept is not covered adequately in
such mediums anyway. That is, *ignorance of DBC extends not just throughout
practitioners but also throughout educators and many industry-experts.*

He lists some more factors and misconceptions that hinder the adoption. I
would definitely recommend you to read at least that section if not the
whole piece.

The conclusion paragraph "Culture Shift: Too Soon or Too Late" was also
telling:

> *The simplicity and obvious benefits of Design By Contract lead one to
> wonder why it has not become 'standard practice' in the software
> development industry.* When the concept has been explained to various
> technical people (all non-programmers), they invariably agree that it is a
> sensible approach and some even express dismay that software components are
> not developed this way.
>
> It is just another indicator of the immaturity of the software development
> industry. The failure to produce high-quality products is also blatantly
> obvious from the non-warranty license agreement of commercial software. Yet
> consumers continue to buy software they suspect and even expect to be of
> poor quality. Both quality and lack-of-quality have a price tag, but the
> difference is in who pays and when. As long as companies can continue to
> experience rising profits while selling poor-quality products, what
> incentive is there to change? Perhaps the fall-out of the "Year 2000"
> problem will focus enough external pressure on the industry to jolt it
> towards improved software development methods. There is talk of certifying
> programmers like other professionals. If and when that occurs, the benefits
> of Design By Contract just might begin to be appreciated.
>
> But it is doubtful. Considering the typical 20 year rule for adopting
> superior technology, DBC as exemplified by Eiffel, has another decade to
> go. But if Java succeeds in becoming a widely-used language and JavaBeans
> become a widespread form of reuse then it would already be too late for DBC
> to have an impact. iContract will be a hardly-noticed event much like ANNA
> for Ada and A++ for C++. This is because *the philosophy/mindset/culture
> is established by the initial publication of the language and its standard
> library.*
>
(Emphasis mine; iContract refers to a Java design-by-contract library)

Hence the salient argument is the lack of *tools* for DbC. So far, none of
the existing DbC libraries in Python really have the capabilities to be
included in the code base. The programmer had to duplicate the contract,
the messages did not include the values involved in the condition, one
could not inherit the contracts and the contracts were not included in the
documentation. Some libraries supported some of these features, but none up
to icontract library supported them all. icontract finally supports all
these features.

I have *never* seen a rational argument how writing contracts (1) is *inferior
*to approaches 2-5), except that it's hard for programmers untrained in
writing formal expressions and for the lack of tools. I would be really
thankful if you could address these points and show me where I am wrong *given
that formalism and tools are not a problem*. We can train the untrained,
and we can develop tools (and put them into standard library). This will
push adoption to far above 1%.

Finally, it is *obvious* to me that the documentation is important. I see
lacking documentation as one of the major hits in the productivity of a
programmer. If there is a tool that could easily improve the
documentation (*i.e.
*formal contracts with one line of code per contract) and automatically
keep it in sync with the code (by checking the contracts during the
testing), I don't see any *rational *reason why you would dispense of such
a tool. Again, please do correct me and contradict -- I don't want to sound
condescending or arrogant -- I literally can't wrap my head around why
*anybody* would dispense of a such an easy-to-use tool that gives you
better documentation (*i.e. *superior to approaches 2-5) except for lack of
formal skills and lack of supporting library. If you think that the
documentation is *not *important, then please, do explain that since it
goes counter to all my previous experience and intuition (which, of course,
can be wrong).

*Why not Eiffel?*
Hugh wrote:

> Secondly, these "obvious" benefits. If they're obvious, I want to know why
> aren't you using Eiffel? It's a programming language designed around DbC
> concepts. It's been around for three decades, at least as long as Python or
> longer. There's an existing base of compilers and support tools and
> libraries
> and textbooks and experienced programmers to work with.
>
> Could it be that Python has better libraries, is faster to develop for,
> attracts
> more programmers? If so, I suggest it's worth considering that this might
> be *because* Python doesn't have DbC.

Python is easier to write and read, and there are no libraries which are
close in quality in Eiffel space (notably, Numpy, OpenCV, nltk and
sklearn). I really don't see how the quality of these libraries have
anything to do with lack (or presence) of the contracts. OpenCV and Numpy
have contracts all over their code (written as assertions and not
documented), albeit with very non-informative violation messages. And they
are great libraries. Their users would hugely benefit from a more mature
and standardized contracts library with informative violation messages.

*Duck Typing*
Hugh wrote:

> And I wouldn't use DbC for Python because
> I wouldn't find it helpful for the kind of dynamic, exploratory development
> I do in Python. I don't write strict contracts for Python code because in a
> dynamically typed, and duck typed, programming language they just don't
> make sense to me. Which is not to say I think Design by Contract is bad,
> just that it isn't good for Python.
>

I really don't see how DbC has to do with duck typing (unless you reduce it
to mere isinstance conditions, which would simply be a straw-man argument)
-- could you please clarify? As soon as you need to document your code, and
this is what most modules have to do in teams of more than one person
(especially so if you are developing a library for a wider audience), you
need to write down the contracts. Please see above where I tried to
explained  that 2-5) are inferior approaches to documenting contracts
compared to 1).

As I wrote above, I would be very, very thankful if you point me to other
approaches (apart from 1-5) that are superior to contracts or state an
argument why approaches 2-5) are superior to the contracts since that is
what I miss to see.

Cheers,
Marko

On Sun, 23 Sep 2018 at 12:34, Hugh Fisher <[email protected]> wrote:

> > Date: Sun, 23 Sep 2018 07:09:37 +0200
> > From: Marko Ristin-Kaufmann <[email protected]>
> > To: Python-Ideas <[email protected]>
> > Subject: [Python-ideas] Why is design-by-contracts not widely adopted?
>
> [ munch ]
>
> > *. *After properly reading about design-by-contract and getting deeper
> into
> > the topic, there is no rational argument against it and the benefits are
> > obvious. And still, people just wave their hand and continue without
> > formalizing the contracts in the code and keep on writing them in the
> > descriptions.
>
> Firstly, I see a difference between rational arguments against Design By
> Contract (DbC) and against DbC in Python. Rejecting DbC for Python is
> not the same as rejecting DbC entirely.
>
> Programming languages are different, obviously. Python is not the same
> as C is not the same as Lisp... To me this also means that different
> languages are used for different problem domains, and in different styles
> of development. I wouldn't use DbC in programming C or assembler
> because it's not really helpful for the kind of low level close to the
> machine
> stuff I use C or assembler for. And I wouldn't use DbC for Python because
> I wouldn't find it helpful for the kind of dynamic, exploratory development
> I do in Python. I don't write strict contracts for Python code because in a
> dynamically typed, and duck typed, programming language they just don't
> make sense to me. Which is not to say I think Design by Contract is bad,
> just that it isn't good for Python.
>
> Secondly, these "obvious" benefits. If they're obvious, I want to know why
> aren't you using Eiffel? It's a programming language designed around DbC
> concepts. It's been around for three decades, at least as long as Python or
> longer. There's an existing base of compilers and support tools and
> libraries
> and textbooks and experienced programmers to work with.
>
> Could it be that Python has better libraries, is faster to develop for,
> attracts
> more programmers? If so, I suggest it's worth considering that this might
> be *because* Python doesn't have DbC.
>
> Or is this an updated version of the old saying "real programmers write
> FORTRAN in any language" ? If you are accustomed to Design by Contract,
> think of your time in the Python world as a trip to another country. Relax
> and try to program like the locals do. You might enjoy it.
>
> --
>
>         cheers,
>         Hugh Fisher
> _______________________________________________
> Python-ideas mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

_______________________________________________
Python-ideas mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Why is design-by-contracts not widely adopted?

Reply via email to