Re: Syntax checks in perl (was: Re: maint.mk syntax check problems)

Stefano Lattarini Thu, 15 Sep 2011 12:09:11 -0700

Hi Martin.

On Thursday 15 September 2011, Martin von wrote:
> On 15.09.2011 11:37, Jim Meyering wrote:
> > I'm sure that a perl-based
> > implementation would be far more efficient, and probably faster
> > even if the perl implementation doesn't run its tests in parallel.
> > 
> > Perl is well suited to this task.
> > I'm sure some will object to Perl's syntax, but not I.
> 
> Not that I object to Perl, but as we are discussing different languages,
> I'd like to offr Python as an alternative to consider. Main benefits:
> - threading and therefore parallel tests
>
Perl versions from 5.8 onwards should have a threading interface too,
if I'm not mistaken.  I don't know how powerful or easy-to-use it is,
though.


> - clean exception handling, allowing checks to fail in deeply nested
>   code and still recover to proceed with the next check
>
That is possible in perl too (even if it requires various hoops, while
being very natural in python).

> - somewhat mere legible syntax, I believe
>
Let's say that at least python makes it much easier to write legible
code; while doing so is defintely possible with perl too, it requires
more self-discipline.

Still, while I mostly prefer python to perl too these days, we have to
consider that perl is already a pre-requisite for projects using autoconf,
automake, and/or gnulib, so using perl would have the advantage of not
adding another requirement to the maintainer toolchain.

> I'm not sure if these benefits warrant adding another scripting language
> both to the set of tools maintainers are expected to have around and to
> the set of languages to maintain within gnulib.
>
Right; for this reason, I'm 60/40 against the use of python.  But I'd like
to hear the opinion of the most active gnulibers and autoconfers on this.

> > With a good Perl-based harness, I'll certainly be glad to phase
> > out (of projects I maintain) the make-based tests.
> 
> Me, too.
>
Good!  But consider that it might take some time before I post something
usable though (a month or two isn't an unreasonable guess).

> On 15.09.2011 11:14, Stefano Lattarini wrote:
> > About an yaer ago I had proposed a similar move for automake's own
> > maintainer checks; see this RFC patch:
> > 
> > <http://lists.gnu.org/archive/html/automake-patches/2010-07/msg00081.html>
> 
> A good start! My main concerns are that for one, the framework might not
> be flexible enough.
>
It's definitely not fixible enough -- it is was only a rough draft, and mean
to be used only to check shell and make code (and, to a lesser degree, perl
code).

> The approach is well suited for checks processing
> one line at a time, but checks that operate on the file text as a whole,
> or that even pass the file to some other tool (e.g. indent), are rather
> difficult to express. It would be nice if such test types could at least
> be added later on with reasonable overhead.
>
+1

> To add some flexibility for future extensions, I believe that it would
> be good to use some OOP approach, i.e. have test classes.
>
Good idea (but let's call those "check classes", please :-)

Two caveats though, if we go down this road:
  - We should be careful not to over-engineer, especially in the earlier
    phases.
  - We should have a testsuite for the new code; since this new code would
    mostly be intended for in-house use at first, we don't need a really
    "industrial-strenght" coverage, but some automated testing will be
    definitely required.

> Instances of each class could be configured using keywords, which I
> very much like about that approach.
>
Also, it should be very very easy for a maintainer to add a new check,
or to whitelist false positives he's experiencing in one of the "built-in"
check, *whithout* requiring from him an in-depth knowledge of the new
checking system or of perl.

> Most current checks, both from that batch and
> maint.mk, would probably be instances of some regexp-checking class.
> But others could be added later on.
> 
> Perhaps the regular expressions could operate on the whole file by
> default,
>
Or we should provide a config varible to decide the "default matching
unit" -- file or line.

> although that makes obtaining the offending pieces of code a
> bit harder. But line breaks don't really matter in C, so looking even
> accross them would be the right thing to do for many checks.
>
At least, for check intended for C files.  Things might be different for
checks aimed at makefile fragments or shell scripts.

> Some checks might operate on a different set of files than others, e.g.
> generated files instead of version-controlled ones. Current maint.mk
> does that for sc_po_check as well as those checks passing in_files to
> _sc_search_regexp: sc_copyright_check, sc_Wundef_boolean and
> sc_vulnerable_makefile_CVE-2009-4029. So file name alone isn't enough.
>
The code could operate adaptively: if the "list of files" is an actual
list reference, just use it; if it's a scalar, pass it to the 'glob()'
builtin to obtain a list of files; if it is a code reference, call it
(with args, and if yes, which ones? that's to be decided) to fetch the
list of files; and so on (we could devise semantics also for hash
references, regular expression objects, or custom objects, maybe).
The tricky part will be to decide how to operate properly under VPATH
builds ...

> In case some more complicated checks want to exit a single check from
> somewhere inside nested code, I would like to wrap all check execution
> in "eval { ... }" so that a "die" within that code can be recovered
> from. Although I must confess that this would make more sense with a
> one-check-at-a-time look as the outermost one, whereas the proposed perl
> script does one-file-at-a-time, saving io but causing repeated check calls.
>
Of course, I agree with you that this aspect of the prototype will have to
be changed, especially if we want to allow matching on entire-file level.

> The proposed script apparently has no means of configuration so far. I
> guess it would be great if the configuration file were a perl script
> itself,
>
That was my idea of what should be done in the long run.  In fact, the
ChangeLog entry of my RFC patch reads:

  Currently, this is a monolithic script, but it allows the selection
  of a subset of checks to be run (i.e. it doesn't force all tests to
  be run).  Also, it could be easily modified to allow placing the
  definitions of checks (and the list of files these checks are applied
  to) into an external "config" file (basically, this would most
  probably be a perl script to be sourced with the "do" perl builtin).

> so it could not only modify the configuration affecting existing
> checks, but even add completely new checks specific to a given project.
> 
Obviously +1 from me here.

> One more thing: at least for me, the above link does obfuscate large
> parts of perl code which it incorrectly considers to contain e-mail
> addresses, replacing those portions with "address@hidden". So some other
> archived version of that mail must be used, e.g.
> http://article.gmane.org/gmane.comp.sysutils.automake.patches/4302
> 
> Glad things were set in motion,
>  Martin
>

Thanks for the feedback,
  Stefano

Re: Syntax checks in perl (was: Re: maint.mk syntax check problems)

Reply via email to