Adrien Schildknecht <adrien+...@schischi.me> writes:

> Add regexp based on the "Shell Command Language" specifications.
> Because of the lax syntax of sh, some corner cases may not be
> handled properly.
>
> Signed-off-by: Adrien Schildknecht <adrien+...@schischi.me>
> ---

Those of you who helped in the first round of review, any comments,
"This round looks good"'s, ...?

> +PATTERNS("sh",
> +     "^([ \t]*(function[ \t]+)?[a-zA-Z_][a-zA-Z0-9_]*[ \t]*\\([ \t]*\\).*)$",
> +     /* -- */

I do not think it is wrong per-se to try to be as precise as
possible, but I wonder if it is sufficient to cheat and make these
"what is a word?" expressions a bit looser, by declaring that it is
OK if a simpler pattern allows something that are syntactically
illegal in shell, as long as it splits valid shell construct
correctly.  For example:

> +      "[a-zA-Z0-9_]+"
> +      "|[-+0-9]+"

The first one matches an identifier (e.g. If you have frotz="a b c"
and $frotz, two appearances of 'frotz' are matched) and the second
one I think is trying to catch possibly signed integers, but the
latter also matches 0+1+++2 which is already loose (but I do not
think it is a problem).  Perhaps it is sufficient to collapse the
above into a single "[-+a-zA-Z0-9_$]+"?

> +      "|[-+*/<>%&^|=!]=|>>=?|<<=?|\\+\\+|--|\\*\\*|&&|\\|\\||\\[\\[|\\]\\]"
> +      "|>\\||[<>]+&|<>|<<-|;;"),

Likewise.  I wonder if something like "[-~!@#%^&*+=|;/]+" gives too
many false matches.

>  { "default", NULL, -1, { NULL, 0 } },
>  };
>  #undef PATTERNS
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to