> On 27 Oct 2017, at 10:45, Willy Tarreau <w...@1wt.eu> wrote: > > Hi Thierry, > > On Wed, Oct 25, 2017 at 01:02:18PM +0200, Thierry Fournier wrote: >> Hi, >> >> This is a patch for lua which adds HAProxy internal regexes support. > > Thanks but I'm having a doubt regarding the risk that it introduces > bugs : > >> +static int hlua_regex_match(struct lua_State *L) >> +{ >> + struct my_regex *regex; >> + const char *str; >> + size_t len; >> + regmatch_t pmatch[20]; >> + int ret; >> + int i; >> + >> + regex = hlua_check_regex(L, 1); >> + str = luaL_checklstring(L, 2, &len); > > As you can see above, luaL_checklstring() returns a const char*, which > is also what your <str> variable is, so it clearly means that the result > is *not* supposed to be modified. > >> + ret = regex_exec_match2(regex, (char *)str, len, 20, pmatch, 0); > > And this forced cast to R/W is there to allow regex_exec_match2() to > explicitly modify this string. The function prototype is quite clear > about it : > > /* This function apply regex. It take a "char *" ans length as input. The > * <subject> can be modified during the processing. If the function doesn't > * match, it returns false, else it returns true. > * When it is compiled with standard POSIX regex or PCRE, this function add > * a temporary null chracters at the end of the <subject>. The <subject> must > * have a real length of <length> + 1. Currently the only supported flag is > * REG_NOTBOL. > */ > int regex_exec_match2(const struct my_regex *preg, char *subject, int length, > size_t nmatch, regmatch_t pmatch[], int flags) { > (...) > char old_char = subject[length]; > int match; > > flags &= REG_NOTBOL; > subject[length] = 0; > match = regexec(&preg->regex, subject, nmatch, pmatch, flags); > subject[length] = old_char; > if (match == REG_NOMATCH) > return 0; > > This happens in the case where the PCRE is not used and the libc's regex lib > is used instead. So I think it's a bit dangerous to proceed like this. First, > you never know if the byte you modify is allocated or not. Malloc() usually > returns multiple of 8 or 16 bytes, if your string is exactly 16 bytes, you > may very well temporarily overwrite the next byte which is part of a pointer > belonging to the malloc list. It's possible also that the Lua compiler will > sometimes return real constants (I don't know when, for example maybe when a > subject string matches a keywords that is part of the language and declared > in the .text section) and this will crash. Also with threads it can be tricky > as well because a trailing zero will be added during the time the regex is > executed. > > As a rule of thumb, a read-only variable pointer must never ever be forced > read-write using a cast, because even if it works for you right now, the > const modifier offers guarantees to other developers about what they can do, > and such guarantees are randomly violated by such code trying to prevent the > compiler from detecting the problem. > > I think the best thing to do simply is to memcpy() the subject string into > the trash (which is not used here). As subjects are supposed to be short > most of the time, you will not even notice any performance impact.
Good catch. It is really sad because the Lua ensure that the final '\0’ is set :-( Maybe it will be clever to add a function like regex_prepare_string which copy a const string or not according with the compilation options and the final '\0’ found or not ? Or two parameters for the regex function: "char *subject” and “const char * subject0”. These parameters are exclusive, or a flag confirming or not the final ‘\0’. Maybe simply changing the addition of the final ‘\0’ by a conditional change, and modifying the documentation of the function: If HAProxy is compiled with traditional libc regex, and the string is not null terminated, the input must not be const, otherwise the input string is not changed. Or a new function function regex_exec_match3() which takes only a null terminated string... Note that, I try to save a memcpy(), which use a little bit of CPU, just before executing a regex which heavily use the CPU, maybe it is useless. > As a bonus I'm attaching your current patch in which I fixed a few typos in > the commit message and the doc :-) Thanks. Thierry > Thanks, > Willy > <0001-MINOR-hlua-Add-regex-class.patch>