https://bugs.exim.org/show_bug.cgi?id=2315
--- Comment #8 from Rich Siegel <[email protected]> --- Right you are - after building a clean set of sources and re-testing (I had to set PCRE_SUPPORT_UNICODE_PROPERTIES = 1 first), my PCRE1 test case started to fail. I believe I have figured out what's going on: I diffed the CMakeCache.txt from my svn working copy, and from the clean sources. What I found was this: - in the "clean" PCRE1 build, CMAKE_CFLAGS was empty: //Flags used by the C compiler during all build types. CMAKE_C_FLAGS:STRING= - in my svn working copy of PCRE1, which was able to match LF line breaks when "\r" occurred in a pattern, I had: //Flags used by the compiler during all build types. CMAKE_C_FLAGS:STRING=-mno-mmx -mno-sse -DLINK_SIZE=4 -DESC_r=CHAR_LF Clearly, the "ESC_r=CHAR_LF" is the salient difference, and it explains not only (a) the behavior difference between my working copy and the "clean" PCRE1 build; but also (2) why "\r" in a pattern will match an LF that occurs in the subject in the PCRE1 that I've been using. Now, here's where the plot thickens. I was staring at this configuration, and wondering why it didn't work in PCRE2. So I did some searching of the sources, and determined that PCRE2 doesn't use "ESC_r" anymore. So it didn't matter at all that I was setting it in CCFLAGS. I suspect that this feature was lost in the new work that was done for PCRE2. I'd be grateful if there could be some supported way to determine at compile time whether "\r" maps to CHAR_CR (factory default to preserve current behavior), CHAR_LF ("\r" matches LF, my current use case), or something else. Meanwhile, I've made two changes: - in pcre2_internal.h, I brought over a conditional definition of "ESC_r", so that if it's not explicitly set in CCFLAGS, it'll default to CHAR_CR. - in pcre2_compile.c, I changed "CHAR_CR" in the escape tables to use "ESC_r". Here are the patches. =================================================================== --- pcre2_compile.c (revision 1003) +++ pcre2_compile.c (working copy) @@ -521,7 +521,7 @@ 0, 0, CHAR_LF, 0, -ESC_p, 0, - CHAR_CR, -ESC_s, + ESC_r, -ESC_s, CHAR_HT, 0, -ESC_v, -ESC_w, 0, 0, @@ -549,7 +549,7 @@ /* 80 */ CHAR_BEL, -ESC_b, 0, -ESC_d, CHAR_ESC, CHAR_FF, 0, /* 88 */ -ESC_h, 0, 0, '{', 0, 0, 0, 0, /* 90 */ 0, 0, -ESC_k, 0, 0, CHAR_LF, 0, -ESC_p, -/* 98 */ 0, CHAR_CR, 0, '}', 0, 0, 0, 0, +/* 98 */ 0, ESC_r, 0, '}', 0, 0, 0, 0, /* A0 */ 0, '~', -ESC_s, CHAR_HT, 0, -ESC_v, -ESC_w, 0, /* A8 */ 0, -ESC_z, 0, 0, 0, '[', 0, 0, /* B0 */ 0, 0, 0, 0, 0, 0, 0, 0, =================================================================== --- pcre2_internal.h (revision 1003) +++ pcre2_internal.h (working copy) @@ -369,6 +369,10 @@ Any changes should ensure that the various macros are kept in step with each other. NOTE: The values also appear in pcre2_jit_compile.c. */ +#ifndef ESC_r +#define ESC_r CHAR_CR +#endif + /* -------------- ASCII/Unicode environments -------------- */ #ifndef EBCDIC -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
