> On 27 Oct 2017, at 10:45, Willy Tarreau <w...@1wt.eu> wrote:
> 
> Hi Thierry,
> 
> On Wed, Oct 25, 2017 at 01:02:18PM +0200, Thierry Fournier wrote:
>> Hi,
>> 
>> This is a patch for lua which adds HAProxy internal regexes support.
> 
> Thanks but I'm having a doubt regarding the risk that it introduces
> bugs :
> 
>> +static int hlua_regex_match(struct lua_State *L)
>> +{
>> +    struct my_regex *regex;
>> +    const char *str;
>> +    size_t len;
>> +    regmatch_t pmatch[20];
>> +    int ret;
>> +    int i;
>> +
>> +    regex = hlua_check_regex(L, 1);
>> +    str = luaL_checklstring(L, 2, &len);
> 
> As you can see above, luaL_checklstring() returns a const char*, which
> is also what your <str> variable is, so it clearly means that the result
> is *not* supposed to be modified.
> 
>> +    ret = regex_exec_match2(regex, (char *)str, len, 20, pmatch, 0);
> 
> And this forced cast to R/W is there to allow regex_exec_match2() to
> explicitly modify this string. The function prototype is quite clear
> about it :
> 
>  /* This function apply regex. It take a "char *" ans length as input. The
>   * <subject> can be modified during the processing. If the function doesn't
>   * match, it returns false, else it returns true.
>   * When it is compiled with standard POSIX regex or PCRE, this function add
>   * a temporary null chracters at the end of the <subject>. The <subject> must
>   * have a real length of <length> + 1. Currently the only supported flag is
>   * REG_NOTBOL.
>   */
>  int regex_exec_match2(const struct my_regex *preg, char *subject, int length,
>                        size_t nmatch, regmatch_t pmatch[], int flags) {
>  (...)
>       char old_char = subject[length];
>       int match;
> 
>       flags &= REG_NOTBOL;
>       subject[length] = 0;
>       match = regexec(&preg->regex, subject, nmatch, pmatch, flags);
>       subject[length] = old_char;
>       if (match == REG_NOMATCH)
>               return 0;
> 
> This happens in the case where the PCRE is not used and the libc's regex lib
> is used instead. So I think it's a bit dangerous to proceed like this. First,
> you never know if the byte you modify is allocated or not. Malloc() usually
> returns multiple of 8 or 16 bytes, if your string is exactly 16 bytes, you
> may very well temporarily overwrite the next byte which is part of a pointer
> belonging to the malloc list. It's possible also that the Lua compiler will
> sometimes return real constants (I don't know when, for example maybe when a
> subject string matches a keywords that is part of the language and declared
> in the .text section) and this will crash. Also with threads it can be tricky
> as well because a trailing zero will be added during the time the regex is
> executed.
> 
> As a rule of thumb, a read-only variable pointer must never ever be forced
> read-write using a cast, because even if it works for you right now, the
> const modifier offers guarantees to other developers about what they can do,
> and such guarantees are randomly violated by such code trying to prevent the
> compiler from detecting the problem.
> 
> I think the best thing to do simply is to memcpy() the subject string into
> the trash (which is not used here). As subjects are supposed to be short
> most of the time, you will not even notice any performance impact.


Good catch.

It is really sad because the Lua ensure that the final '\0’ is set :-( Maybe
it will be clever to add a function like regex_prepare_string which copy a
const string or not according with the compilation options and the final '\0’
found or not ?

Or two parameters for the regex function: "char *subject” and
“const char * subject0”. These parameters are exclusive, or a flag confirming
or not the final ‘\0’.

Maybe simply changing the addition of the final ‘\0’ by a conditional change,
and modifying the documentation of the function:

   If HAProxy is compiled with traditional libc regex, and the string is not
   null terminated, the input must not be const, otherwise the input string is
   not changed.

Or a new function function regex_exec_match3() which takes only a null 
terminated
string...

Note that, I try to save a memcpy(), which use a little bit of CPU, just before
executing a regex which heavily use the CPU, maybe it is useless.


> As a bonus I'm attaching your current patch in which I fixed a few typos in
> the commit message and the doc :-)


Thanks.

Thierry


> Thanks,
> Willy
> <0001-MINOR-hlua-Add-regex-class.patch>


Reply via email to