Adrien Schildknecht <[email protected]> writes:
> Add regexp based on the "Shell Command Language" specifications.
> Because of the lax syntax of sh, some corner cases may not be
> handled properly.
>
> Signed-off-by: Adrien Schildknecht <[email protected]>
> ---
Those of you who helped in the first round of review, any comments,
"This round looks good"'s, ...?
> +PATTERNS("sh",
> + "^([ \t]*(function[ \t]+)?[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\).*)$",
> + /* -- */
I do not think it is wrong per-se to try to be as precise as
possible, but I wonder if it is sufficient to cheat and make these
"what is a word?" expressions a bit looser, by declaring that it is
OK if a simpler pattern allows something that are syntactically
illegal in shell, as long as it splits valid shell construct
correctly. For example:
> + "[a-zA-Z0-9_]+"
> + "|[-+0-9]+"
The first one matches an identifier (e.g. If you have frotz="a b c"
and $frotz, two appearances of 'frotz' are matched) and the second
one I think is trying to catch possibly signed integers, but the
latter also matches 0+1+++2 which is already loose (but I do not
think it is a problem). Perhaps it is sufficient to collapse the
above into a single "[-+a-zA-Z0-9_$]+"?
> + "|[-+*/<>%&^|=!]=|>>=?|<<=?|\\+\\+|--|\\*\\*|&&|\\|\\||\\[\\[|\\]\\]"
> + "|>\\||[<>]+&|<>|<<-|;;"),
Likewise. I wonder if something like "[-~!@#%^&*+=|;/]+" gives too
many false matches.
> { "default", NULL, -1, { NULL, 0 } },
> };
> #undef PATTERNS
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html