On Friday 12 March 2010, Sarah Strong wrote:
> Proposed output:
>
> ************* Module Kontroller
> W: 9: Bad indentation. Found 2 spaces, expected 4
> W: 10: Bad indentation. Found 2 spaces, expected 4
> W: 11: Bad indentation. Found 2 spaces, expected 4
> W: 12: Bad indentation. Found 2 spaces, expected 4
> [4 more Bad indentation messages, use --unabridged to display them all]
In addition to avoiding discouragment of new users, it makes the more
serious problems stand out more because they are not lost in a sea of
repeated warnings. If pylint finds a bug on the first run it makes a good
first impression on the user.
> Errors such as
> C:111:Kontroller.addWord: Invalid name "addWord" (should match
> [a-z_][a-z0-9_]{2,30}$)
> C: 89:Kontroller.checkUserId: Invalid name "checkUserId" (should match
> [a-z_][a-z0-9_]{2,30}$)
>
> are a bit confusing because the user may be unsure of why such a pattern
> match
> is necessary.
These regular expressions describe the naming conventions of a particular
project. There does not seem to be an official style for Python: even the
standard library uses camelCase in some modules and underscore_as_separator
in other modules.
So it is expected that a project will customize these expressions to match
the conventions of that specific project. Regular expressions are a very
good way of allowing that kind of customization. Unfortunately, not everyone
is experienced in reading them.
It may be possible to generate a useful English description of the problem
by looking at the expression and how it fails to match the name. For
example, whether it encounters an invalid character (if so, indicate which
one), whether it has an issue with the length of the name etc. Maybe a nice
challenge for a student?
> Proposed output for messages:
>
>
> +-----------+------------+-------------+
> |message id |occurrences |reference |
> +===========+============+=============+
> +-----------+------------+-------------+
> |E0602 |2 |PEP 333 |
> +-----------+------------+-------------+
> |W0612 |1 |http:// |
> +-----------+------------+-------------+
> |W0301 |1 |file:///....-+
> +-----------+------------+-------------+
> |F0401 |1 |(field empty)|
> +-----------+------------+-------------+
>
> where the links might be to the pylint error code wiki, python peps,
> pylint documentation, or local documentation.
PMD, a static code checker for Java, has an explanation of every built-in
rule on its web site:
http://pmd.sourceforge.net/rules/optimizations.html#AddEmptyString
This is really useful if you are wondering what the value is of obeying a
certain rule.
Maybe it could be done with systematic URLs: (example; URL does not exist)
http://www.logilab.org/project/pylint/rules/E0602
It could be a Wiki where experienced users can add the motivation behind the
checks pylint does. Adding text to a Wiki has a lower barrier of entry than
submitting a patch.
> * Possible output improvement #3*
>
> Modify pylint's rating system not to give negative ratings out of ten.
> This doesn't match up to most people's expectations of how a rating
> works.
The rating system is part of the configuration. The default is:
evaluation=10.0 - ((float(5 * error + warning + refactor + convention) /
statement) * 10)
So it computes a penalty and then subtracts that from 10.0. Since there is
no limit to the amount of issues found, it can become negative no matter
whether you set the maximum to 10 or 10000.
Clipping the rating at 0 is not a good solution, since users would like to
see their rating improve when they fix issues. If the rating improves from
-6000 to -5000 but both are clipped to 0, there is no visible progress,
which is discouraging.
You could change the weight of one issue (the "* 10" in the formula) so in
any code that is not deliberately designed to trigger as many warnings as
possible would get a rating above 0. But then it would give overly high
marks for code that is neither very poor nor great.
A non-linear rating is probably the best solution. This also fits the
typical progression in the number of issues found: initially there will be
many violations because of consistent errors and/or lack of customization of
the configuration to a project's conventions (coding style). Fixing those is
relatively easy, so in the first phase the number of violations drops very
quickly. It would make sense that the rating would increase a bit because of
this, but not as dramatically as it does now. Conversely, the last issues to
be fixed are probably the hardest, so fixing just a handful of those should
already have a noticable effect on the rating.
I do think the rating has some value: if you compute it over a long time,
you can get an impression whether the quality of your code base is improving
or deteriorating over time. Currently the absolute score does not have much
meaning though. It would be useful if someone could attempt to tune it.
Bye,
Maarten
_______________________________________________
Python-Projects mailing list
[email protected]
http://lists.logilab.org/mailman/listinfo/python-projects