Re: [Python-projects] Proposed output improvements

Maarten ter Huurne Fri, 12 Mar 2010 17:58:28 -0800

On Friday 12 March 2010, Sarah Strong wrote:

> Proposed output:
> 
> ************* Module Kontroller
> W:  9: Bad indentation. Found 2 spaces, expected 4
> W: 10: Bad indentation. Found 2 spaces, expected 4
> W: 11: Bad indentation. Found 2 spaces, expected 4
> W: 12: Bad indentation. Found 2 spaces, expected 4
> [4 more Bad indentation messages, use --unabridged to display them all]


In addition to avoiding discouragment of new users, it makes the more 
serious problems stand out more because they are not lost in a sea of 
repeated warnings. If pylint finds a bug on the first run it makes a good 
first impression on the user.

> Errors such as
> C:111:Kontroller.addWord: Invalid name "addWord" (should match
> [a-z_][a-z0-9_]{2,30}$)
> C: 89:Kontroller.checkUserId: Invalid name "checkUserId" (should match
> [a-z_][a-z0-9_]{2,30}$)
> 
> are a bit confusing because the user may be unsure of why such a pattern
> match
> is necessary.

These regular expressions describe the naming conventions of a particular 
project. There does not seem to be an official style for Python: even the 
standard library uses camelCase in some modules and underscore_as_separator 
in other modules.

So it is expected that a project will customize these expressions to match 
the conventions of that specific project. Regular expressions are a very 
good way of allowing that kind of customization. Unfortunately, not everyone 
is experienced in reading them.

It may be possible to generate a useful English description of the problem 
by looking at the expression and how it fails to match the name. For 
example, whether it encounters an invalid character (if so, indicate which 
one), whether it has an issue with the length of the name etc. Maybe a nice 
challenge for a student?

> Proposed output for messages:
> 
> 
> +-----------+------------+-------------+
> |message id |occurrences |reference    |
> +===========+============+=============+
> +-----------+------------+-------------+
> |E0602      |2           |PEP 333      |
> +-----------+------------+-------------+
> |W0612      |1           |http://      |
> +-----------+------------+-------------+
> |W0301      |1           |file:///....-+
> +-----------+------------+-------------+
> |F0401      |1           |(field empty)|
> +-----------+------------+-------------+
> 
> where the links might be to the pylint error code wiki, python peps,
> pylint documentation, or local documentation.

PMD, a static code checker for Java, has an explanation of every built-in 
rule on its web site:
  http://pmd.sourceforge.net/rules/optimizations.html#AddEmptyString
This is really useful if you are wondering what the value is of obeying a 
certain rule.

Maybe it could be done with systematic URLs: (example; URL does not exist)
  http://www.logilab.org/project/pylint/rules/E0602
It could be a Wiki where experienced users can add the motivation behind the 
checks pylint does. Adding text to a Wiki has a lower barrier of entry than 
submitting a patch.
 
> * Possible output improvement #3*
> 
> Modify pylint's rating system not to give negative ratings out of ten.
> This doesn't match up to most people's expectations of how a rating
> works.

The rating system is part of the configuration. The default is:

evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / 
statement) * 10)

So it computes a penalty and then subtracts that from 10.0. Since there is 
no limit to the amount of issues found, it can become negative no matter 
whether you set the maximum to 10 or 10000.

Clipping the rating at 0 is not a good solution, since users would like to 
see their rating improve when they fix issues. If the rating improves from 
-6000 to -5000 but both are clipped to 0, there is no visible progress, 
which is discouraging.

You could change the weight of one issue (the "* 10" in the formula) so in 
any code that is not deliberately designed to trigger as many warnings as 
possible would get a rating above 0. But then it would give overly high 
marks for code that is neither very poor nor great.

A non-linear rating is probably the best solution. This also fits the 
typical progression in the number of issues found: initially there will be 
many violations because of consistent errors and/or lack of customization of 
the configuration to a project's conventions (coding style). Fixing those is 
relatively easy, so in the first phase the number of violations drops very 
quickly. It would make sense that the rating would increase a bit because of 
this, but not as dramatically as it does now. Conversely, the last issues to 
be fixed are probably the hardest, so fixing just a handful of those should 
already have a noticable effect on the rating.

I do think the rating has some value: if you compute it over a long time, 
you can get an impression whether the quality of your code base is improving 
or deteriorating over time. Currently the absolute score does not have much 
meaning though. It would be useful if someone could attempt to tune it.

Bye,
                Maarten
_______________________________________________
Python-Projects mailing list
[email protected]
http://lists.logilab.org/mailman/listinfo/python-projects

Re: [Python-projects] Proposed output improvements

Reply via email to