Guillaume Munch wrote:

> Le 20/11/2015 19:43, Georg Baum a écrit :
>>
>> It does. The reason is that the output of escape_special_chars() changes
>> when compiling with std::regex. The attached patch would fix the test for
>> std::regex, but then make check would fail when using boost::regex.
> 
> Of course, the source should be corrected, not the test.

Sure. This was to demonstrate the differences.

> BTW I read the new documentation for tests and it did not help me at
> all. It said nothing for "make check", and when I run "make check" it is
> still a mystery to me where I can get any log file that says what went
> wrong. (The file test_biblio.log is always empty.)

Yes, these tests are unfortunately not documented yet. Apart from that: The 
autotools test machinery is too complicated. I do not know an equivalent for 
make check, but you can run the equivalent of make alltests also with cmake 
which I would recommend:

cd $BUILDTREE/src/test
make test

>> This shows that the regex engines we use do still interpret the
>> expressions in different ways, and I believe I also know the reason: The
>> expressions have changed to ECMAScript syntax, but the boost::regex
>> engine does not default to ECMAScript, but to boost perl style.
> 
> But "perl" for Boost is just a synonym for "ECMAScript", and the
> difference between std and boost in the current master is only
> supposed to be a few perl extensions added. (See
> 
<http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/ref/syntax_option_type/syntax_option_type_perl.html>.)

Yes, I realized this now as well after reading the sources. I was lead on 
the wrong track by the MSVC workaround in regex.h, and the fact that 
boost::regex also has some undocumented features (like the one I got rid of 
with dfe3a7d9fc57)

> Therefore the problem is probably still in the expressions. What I
> understand from your patch is that the output in std::regex is still
> wrong: the regex adds \\ instead of \. The result of the test you showed
> with the patch. I am worried by the line:
> 
>    return lyx::regex_replace(expr, reg, "\\\\$&");
> 
> This should be:
> 
>    return lyx::regex_replace(expr, reg, "\\$&");
> 
> but then this would no longer work with Boost, because it interprets the
> backslash as an escape sequence... This appears to be a bug in Boost:
> 
>    format_default: Specifies that when a regular expression match is to
>    be replaced by a new string, that the new string is constructed using
>    the rules used by the ECMAScript replace function in ECMA-262,
>    ECMAScript Language Specification, Chapter 15 part 5.4.11
>    String.prototype.replace. (FWD.1).
>    This is functionally identical to the Perl format string rules.
> 
> But the "Perl format" treats backslashes as a character one must escape,
> which contradicts the preceding sentence.

The reference to ECMAScript may be wrong, but boost behaves as documented: 
"Perl format string rules" in the original text at 
http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/ref/match_flag_type.html
 
is a link pointing to 
http://www.boost.org/doc/libs/master/libs/regex/doc/html/boost_regex/format/perl_format.html.

That page states that a backslash represents the literal character following 
the backslash, unless the following character is listed as exception. If we 
read the expression "\$&" (as seen by the regex engine, in C++ source it 
would be "\\$&") from left to right we see a backslash, followed by a dollar 
sign. The dollar sign is not listed as exception => \$ is read as a literal 
$, and the back reference $& is not recognized.

BTW, references to Perl syntax in boost::regex do not mean 100% perl 
compatibilty, but the boost::regex "perl" flavour, which is "based on that 
used by the programming language Perl" (see 
http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html).

std::regex behaves as documented as well: Escape sequences for backslashes 
are not mentioned for the default format (see e.g. 
http://www.cplusplus.com/reference/regex/regex_replace/ or 
https://msdn.microsoft.com/en-us/library/bb982727.aspx), so backslashes are 
always taken literally. I did some experiments with gcc 4.9 and MSVC 
(compiler version: 19.00.23427.0(x86) using the online service at 
http://webcompiler.cloudapp.net/ for the latter), and they support this 
understanding.

Therefore, we simply have two regex implementations that treat the format 
string of regex_replace() according to different rules. Fortunately, this 
seems to be the only difference we need to care of.

> On my side master still works fine with Boost so I don't see an inverted
> situation. But, if we did the correct thing and changed the format
> string to \$& as it should be, we would indeed end up in a symmetric
> situation.

Yes, this was my misunderstanding that perl and ECMAScript would be 
different. After much more reading and testing I see this now as well.

> I am afraid that for now there are only two unpleasant solutions:
> * fix this particular replacement using a hack, or
> * use Boost on every platform until we can get rid of it entirely.

I went through all other uses of regex_replace(), and after one fix all 
other uses work with both implementations. Therefore, I would simply fix the 
situation with conditional compilation. If we go back to boost, we will have 
the same problems only later when we switch to C++11 in 2.3.

It is now also clear that the MSVC workaround is not needed, so I deleted 
it. Is the attached patch OK to go in?


Georg
diff --git a/src/frontends/qt4/GuiCitation.cpp b/src/frontends/qt4/GuiCitation.cpp
index dc41fbd..8856ebc 100644
--- a/src/frontends/qt4/GuiCitation.cpp
+++ b/src/frontends/qt4/GuiCitation.cpp
@@ -670,11 +670,19 @@ static docstring escape_special_chars(docstring const & expr)
 
 	// $& is an ECMAScript format expression that expands to all
 	// of the current match
-	// The '$' must be prefixed with the escape character '\' for
-	// boost to treat it as a literal.
-	// Thus, to prefix a matched expression with '\', we use:
+#if defined(LYX_USE_CXX11) && defined(LYX_USE_STD_REGEX)
+	// To prefix a matched expression with a single literal backslash, we
+	// need to escape it for the C++ compiler and use:
+	// FIXME: UNICODE
+	return from_utf8(lyx::regex_replace(to_utf8(expr), reg, string("\\$&")));
+#else
+	// A backslash in the format string starts an escape sequence in boost.
+	// Thus, to prefix a matched expression with a single literal backslash,
+	// we need to give two backslashes to the regex engine, and escape both
+	// for the C++ compiler and use:
 	// FIXME: UNICODE
 	return from_utf8(lyx::regex_replace(to_utf8(expr), reg, string("\\\\$&")));
+#endif
 }
 
 
diff --git a/src/frontends/tests/biblio.cpp b/src/frontends/tests/biblio.cpp
index 5a7a376..4ba1d40 100644
--- a/src/frontends/tests/biblio.cpp
+++ b/src/frontends/tests/biblio.cpp
@@ -20,10 +20,17 @@ string const escape_special_chars(string const & expr)
 
 	// $& is a ECMAScript format expression that expands to all
 	// of the current match
-	// The '$' must be prefixed with the escape character '\' for
-	// boost to treat it as a literal.
-	// Thus, to prefix a matched expression with '\', we use:
+#if defined(LYX_USE_CXX11) && defined(LYX_USE_STD_REGEX)
+	// To prefix a matched expression with a single literal backslash, we
+	// need to escape it for the C++ compiler and use:
+	return lyx::regex_replace(expr, reg, "\\$&");
+#else
+	// A backslash in the format string starts an escape sequence in boost.
+	// Thus, to prefix a matched expression with a single literal backslash,
+	// we need to give two backslashes to the regex engine, and escape both
+	// for the C++ compiler and use:
 	return lyx::regex_replace(expr, reg, "\\\\$&");
+#endif
 }
 
 
diff --git a/src/support/regex.h b/src/support/regex.h
index 94140a5..fd6f1e5 100644
--- a/src/support/regex.h
+++ b/src/support/regex.h
@@ -14,74 +14,20 @@
 
 #if defined(LYX_USE_CXX11) && defined(LYX_USE_STD_REGEX)
 #  include <regex>
-#  ifdef _MSC_VER
-namespace lyx {
-  // inheriting 'private' to see which functions are used and if there are
-  // other ECMAScript defaults
-  // FIXME: Is this really needed?
-  //        If yes, then the MSVC regex implementation is not standard-conforming.
-  class regex : private std::regex
-  {
-  public:
-    regex() {}
-    regex(const regex& rhs) : std::regex(rhs) {}
-    template<class T>
-    regex(T t) : std::regex(t, std::regex_constants::grep) {}
-    template<class T>
-    void assign(T t) { std::regex::assign(t, std::regex_constants::grep); }
-    template<class T, class V>
-    void assign(T t, V v) { std::regex::assign(t, v); }
-    const std::regex& toStd() const { return *this; }
-  };
-  template<class T>
-  bool regex_match(T t, const regex& r) { return std::regex_match(t, r.toStd()); }
-  template<class T>
-  bool regex_match(T t, std::smatch& m, const regex& r) { return std::regex_match(t, m, r.toStd()); }
-  template<class T, class V>
-  bool regex_match(T t, V v, std::smatch& m, const regex& r) { return std::regex_match(t, v, m, r.toStd()); }
-  template<class T, class V>
-  std::string regex_replace(T t, const regex& r, V v) { return std::regex_replace(t, r.toStd(), v); }
-  template<class T, class V, class U, class H>
-  T regex_replace(T t, V v, U u, const regex& r, H h) { return std::regex_replace(t, v, u, r.toStd(), h); }
-  template<class T>
-  bool regex_search(T t, const regex& r) { return std::regex_search(t, r.toStd()); }
-  template<class T>
-  bool regex_search(T t, std::smatch& m, const regex& r) { return std::regex_search(t, m, r.toStd()); }
-  template<class T, class V>
-  bool regex_search(T t, V v, std::smatch& m, const regex& r) { return std::regex_search(t, v, m, r.toStd()); }
-
-  struct sregex_iterator : std::sregex_iterator
-  {
-    sregex_iterator() {}
-    template<class T, class V>
-    sregex_iterator(T t, V v, const regex& r) : std::sregex_iterator(t, v, r.toStd()) {}
-  };
-}
-#  else
 // <regex> in gcc is unusable in versions less than 4.9.0
 // see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631
 #  define LR_NS std
-namespace lyx {
-using LR_NS::regex;
-using LR_NS::regex_match;
-using LR_NS::regex_replace;
-using LR_NS::regex_search;
-using LR_NS::sregex_iterator;
-}
-#  endif
 #else
 #  include <boost/regex.hpp>
 #  define LR_NS boost
+#endif
+
 namespace lyx {
 using LR_NS::regex;
 using LR_NS::regex_match;
 using LR_NS::regex_replace;
 using LR_NS::regex_search;
 using LR_NS::sregex_iterator;
-}
-#endif
-
-namespace lyx {
 using LR_NS::smatch;
 using LR_NS::basic_regex;
 using LR_NS::regex_error;

Reply via email to