I'd like to push another option on the pile...
I want to introduce... a _rule_ constraint. Here is the example again with some
additional syntax for constraints:
@@
identifier f = rule r;
@@
* f(3)
It says: accept an identifier as f satisfies _abstract_ rule r. What is this r
then?
It is almost a normal rule, but with some subtle differences:
@abstract r script:ocaml@
@@
fun x -> not (List.mem x badnames)
The script must be a function that takes an AST as argument and returns a
boolean that indicates whether the argument is acceptable. Thus, when we then
want to check if an identifier satisfies r, we thus execute the script with
that identifier as parameter.
(if the script is python, it must be some function that takes a string as first
parameter, and perhaps instead of returning a boolean, it calls an API function
that tells whether or not the input is (un)satisfactory)
Now some subtleties:
a) an abstract rule may appear anywhere in the file, as long as the meta
variables that it may inherit exist in the environment.
b) you may not inherit from an abstract rule. In case of a script this is
already the case.
c) abstract rules are only executed/matched through rule constraints
*********
This approach does not have the problems that arise when you inline scripts in
constraints and makes use of notation that you already have for scripts.
**********
As a nicer notation, we should write the constraint with a colon instead of an
equal sign:
@@
identifier f : rule r;
@@
This could be interpreted as "rule r" being the type of "f", since types are
constrains on values ;)
***********
You can also still combine rule constraints with not,or,and to get something
like:
identifier f = rule a || rule b;
But if you use this combination often, you could then also make an abstract
rule c that does:
identifier f = rule c;
@abstract c@
identifier x = rule a || rule b;
@@
x
Wait a second, this abstract rule is not a script!
Yes, why limit this feature to scripts if we have such an nice language at our
disposal for writing pattern matches!
What happens here is that the body of rule c matches against an identifier x,
which is constraint to either rule a or rule b. The body may be much more
complicated, but it should probably be pure (i.e. not have + or - code)...
***********
I'm not sure if what I'm writing down above is already making you dizzy, but
there is more.
You can see an abstract rule as some form of "let" abstraction: let this meta
variable stand for a more complex pattern.
Here I've got a pattern where "e" stands for an integer expression that must be the addition of two other
integer expressions. We give the instantiation of the abstract rule an explicit name "r" and can then inherit
from it to get access to some of the meta variables "a" and "b" that it matches:
@@
int e = rule add as r;
int r.a;
int r.b;
@@
- f(e)
+ (a - b)
So, the add rule simply matches "a + b":
@abstract add@
int a;
int b;
@@
a + b
***********
I hope I did not dazzle you too much with this. I think that at least the very
first part of this message deserves some attention.
Cheers,
Adriaan
Julia,
This proposal sounds very useful to me.
I would not want to remove any of the existing functionality.
As it currently stands cocci can be used by people who only
know C and I think it would be useful to keep this ability.
Relating to your C++ post earlier this week. I think it was in
the later 90s that somebody told me that writing a C++ compiler
was about seven times the effort of writing a C compiler. A lot
of new stuff has been added in C++11 and not so much in C1x,
so I suspect the ratio will have gone up.
Perhaps a solution would be to allow scripting code in metavariable
declarations. Then we could in principle get rid of all sorts of
constraints. There would be no need to learn the SmPL constraints and
where they could occur. One would just have to remember the syntax of
one's preferred scripting language (from among the optons available :).
So for example, one could write:
@initialize:ocaml@
let badnames = ["one";"two";"three"]
@@
identifier x where ocaml{not (List.mem x badnames)};
@@
*f(3)
I imagine that is it possible to do the same thing in python.
Similarly, one could get rid of the regular expression matching
notation. I assume python provides something for that, for those that
don't want to use ocaml. Note that the interaction with python might be
less efficient than the current native ocaml version.
One could also get rid of the subterm notation, expression e <= r.e1;,
although that would currently require someone who wants this
functionality to use ocaml, because currently only ocaml code gets a
representation of the abstract syntax tree.
An issue is what metavariables this code can use. In the above, I have
assumed that the ocaml code is implicitly parameterized by the
metavariable that is being declared. It would be too complicated to
allow the code to have access to other metavariables being declared at
the same time. But if an appropriate syntax for declaring them could be
found, it would be possible to allow metavariables to be inherited from
previous rules. Currently we have @@ to separate metavariable
declarations from the script code. Perhaps we could use that, although
it seems a bit ugly. Another option would be to have no separator. The
end of the metavariable list would be the occurrence of the last x << r.y;
What do you think?
julia
_______________________________________________
Cocci mailing list
[email protected]
http://lists.diku.dk/mailman/listinfo/cocci
(Web access from inside DIKUs LAN only)
--
Derek M. Jones tel: +44 (0) 1252 520 667
Knowledge Software Ltd blog:shape-of-code.coding-guidelines.com
Source code analysis http://www.knosof.co.uk
_______________________________________________
Cocci mailing list
[email protected]
http://lists.diku.dk/mailman/listinfo/cocci
(Web access from inside DIKUs LAN only)