Guillaume Munch wrote:
> Le 20/11/2015 19:43, Georg Baum a écrit :
>>
>> It does. The reason is that the output of escape_special_chars() changes
>> when compiling with std::regex. The attached patch would fix the test for
>> std::regex, but then make check would fail when using boost::regex.
>
> Of course, the source should be corrected, not the test.
Sure. This was to demonstrate the differences.
> BTW I read the new documentation for tests and it did not help me at
> all. It said nothing for "make check", and when I run "make check" it is
> still a mystery to me where I can get any log file that says what went
> wrong. (The file test_biblio.log is always empty.)
Yes, these tests are unfortunately not documented yet. Apart from that: The
autotools test machinery is too complicated. I do not know an equivalent for
make check, but you can run the equivalent of make alltests also with cmake
which I would recommend:
cd $BUILDTREE/src/test
make test
>> This shows that the regex engines we use do still interpret the
>> expressions in different ways, and I believe I also know the reason: The
>> expressions have changed to ECMAScript syntax, but the boost::regex
>> engine does not default to ECMAScript, but to boost perl style.
>
> But "perl" for Boost is just a synonym for "ECMAScript", and the
> difference between std and boost in the current master is only
> supposed to be a few perl extensions added. (See
>
<http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/ref/syntax_option_type/syntax_option_type_perl.html>.)
Yes, I realized this now as well after reading the sources. I was lead on
the wrong track by the MSVC workaround in regex.h, and the fact that
boost::regex also has some undocumented features (like the one I got rid of
with dfe3a7d9fc57)
> Therefore the problem is probably still in the expressions. What I
> understand from your patch is that the output in std::regex is still
> wrong: the regex adds \\ instead of \. The result of the test you showed
> with the patch. I am worried by the line:
>
> return lyx::regex_replace(expr, reg, "\\\\$&");
>
> This should be:
>
> return lyx::regex_replace(expr, reg, "\\$&");
>
> but then this would no longer work with Boost, because it interprets the
> backslash as an escape sequence... This appears to be a bug in Boost:
>
> format_default: Specifies that when a regular expression match is to
> be replaced by a new string, that the new string is constructed using
> the rules used by the ECMAScript replace function in ECMA-262,
> ECMAScript Language Specification, Chapter 15 part 5.4.11
> String.prototype.replace. (FWD.1).
> This is functionally identical to the Perl format string rules.
>
> But the "Perl format" treats backslashes as a character one must escape,
> which contradicts the preceding sentence.
The reference to ECMAScript may be wrong, but boost behaves as documented:
"Perl format string rules" in the original text at
http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/ref/match_flag_type.html
is a link pointing to
http://www.boost.org/doc/libs/master/libs/regex/doc/html/boost_regex/format/perl_format.html.
That page states that a backslash represents the literal character following
the backslash, unless the following character is listed as exception. If we
read the expression "\$&" (as seen by the regex engine, in C++ source it
would be "\\$&") from left to right we see a backslash, followed by a dollar
sign. The dollar sign is not listed as exception => \$ is read as a literal
$, and the back reference $& is not recognized.
BTW, references to Perl syntax in boost::regex do not mean 100% perl
compatibilty, but the boost::regex "perl" flavour, which is "based on that
used by the programming language Perl" (see
http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html).
std::regex behaves as documented as well: Escape sequences for backslashes
are not mentioned for the default format (see e.g.
http://www.cplusplus.com/reference/regex/regex_replace/ or
https://msdn.microsoft.com/en-us/library/bb982727.aspx), so backslashes are
always taken literally. I did some experiments with gcc 4.9 and MSVC
(compiler version: 19.00.23427.0(x86) using the online service at
http://webcompiler.cloudapp.net/ for the latter), and they support this
understanding.
Therefore, we simply have two regex implementations that treat the format
string of regex_replace() according to different rules. Fortunately, this
seems to be the only difference we need to care of.
> On my side master still works fine with Boost so I don't see an inverted
> situation. But, if we did the correct thing and changed the format
> string to \$& as it should be, we would indeed end up in a symmetric
> situation.
Yes, this was my misunderstanding that perl and ECMAScript would be
different. After much more reading and testing I see this now as well.
> I am afraid that for now there are only two unpleasant solutions:
> * fix this particular replacement using a hack, or
> * use Boost on every platform until we can get rid of it entirely.
I went through all other uses of regex_replace(), and after one fix all
other uses work with both implementations. Therefore, I would simply fix the
situation with conditional compilation. If we go back to boost, we will have
the same problems only later when we switch to C++11 in 2.3.
It is now also clear that the MSVC workaround is not needed, so I deleted
it. Is the attached patch OK to go in?
Georg
diff --git a/src/frontends/qt4/GuiCitation.cpp b/src/frontends/qt4/GuiCitation.cpp
index dc41fbd..8856ebc 100644
--- a/src/frontends/qt4/GuiCitation.cpp
+++ b/src/frontends/qt4/GuiCitation.cpp
@@ -670,11 +670,19 @@ static docstring escape_special_chars(docstring const & expr)
// $& is an ECMAScript format expression that expands to all
// of the current match
- // The '$' must be prefixed with the escape character '\' for
- // boost to treat it as a literal.
- // Thus, to prefix a matched expression with '\', we use:
+#if defined(LYX_USE_CXX11) && defined(LYX_USE_STD_REGEX)
+ // To prefix a matched expression with a single literal backslash, we
+ // need to escape it for the C++ compiler and use:
+ // FIXME: UNICODE
+ return from_utf8(lyx::regex_replace(to_utf8(expr), reg, string("\\$&")));
+#else
+ // A backslash in the format string starts an escape sequence in boost.
+ // Thus, to prefix a matched expression with a single literal backslash,
+ // we need to give two backslashes to the regex engine, and escape both
+ // for the C++ compiler and use:
// FIXME: UNICODE
return from_utf8(lyx::regex_replace(to_utf8(expr), reg, string("\\\\$&")));
+#endif
}
diff --git a/src/frontends/tests/biblio.cpp b/src/frontends/tests/biblio.cpp
index 5a7a376..4ba1d40 100644
--- a/src/frontends/tests/biblio.cpp
+++ b/src/frontends/tests/biblio.cpp
@@ -20,10 +20,17 @@ string const escape_special_chars(string const & expr)
// $& is a ECMAScript format expression that expands to all
// of the current match
- // The '$' must be prefixed with the escape character '\' for
- // boost to treat it as a literal.
- // Thus, to prefix a matched expression with '\', we use:
+#if defined(LYX_USE_CXX11) && defined(LYX_USE_STD_REGEX)
+ // To prefix a matched expression with a single literal backslash, we
+ // need to escape it for the C++ compiler and use:
+ return lyx::regex_replace(expr, reg, "\\$&");
+#else
+ // A backslash in the format string starts an escape sequence in boost.
+ // Thus, to prefix a matched expression with a single literal backslash,
+ // we need to give two backslashes to the regex engine, and escape both
+ // for the C++ compiler and use:
return lyx::regex_replace(expr, reg, "\\\\$&");
+#endif
}
diff --git a/src/support/regex.h b/src/support/regex.h
index 94140a5..fd6f1e5 100644
--- a/src/support/regex.h
+++ b/src/support/regex.h
@@ -14,74 +14,20 @@
#if defined(LYX_USE_CXX11) && defined(LYX_USE_STD_REGEX)
# include <regex>
-# ifdef _MSC_VER
-namespace lyx {
- // inheriting 'private' to see which functions are used and if there are
- // other ECMAScript defaults
- // FIXME: Is this really needed?
- // If yes, then the MSVC regex implementation is not standard-conforming.
- class regex : private std::regex
- {
- public:
- regex() {}
- regex(const regex& rhs) : std::regex(rhs) {}
- template<class T>
- regex(T t) : std::regex(t, std::regex_constants::grep) {}
- template<class T>
- void assign(T t) { std::regex::assign(t, std::regex_constants::grep); }
- template<class T, class V>
- void assign(T t, V v) { std::regex::assign(t, v); }
- const std::regex& toStd() const { return *this; }
- };
- template<class T>
- bool regex_match(T t, const regex& r) { return std::regex_match(t, r.toStd()); }
- template<class T>
- bool regex_match(T t, std::smatch& m, const regex& r) { return std::regex_match(t, m, r.toStd()); }
- template<class T, class V>
- bool regex_match(T t, V v, std::smatch& m, const regex& r) { return std::regex_match(t, v, m, r.toStd()); }
- template<class T, class V>
- std::string regex_replace(T t, const regex& r, V v) { return std::regex_replace(t, r.toStd(), v); }
- template<class T, class V, class U, class H>
- T regex_replace(T t, V v, U u, const regex& r, H h) { return std::regex_replace(t, v, u, r.toStd(), h); }
- template<class T>
- bool regex_search(T t, const regex& r) { return std::regex_search(t, r.toStd()); }
- template<class T>
- bool regex_search(T t, std::smatch& m, const regex& r) { return std::regex_search(t, m, r.toStd()); }
- template<class T, class V>
- bool regex_search(T t, V v, std::smatch& m, const regex& r) { return std::regex_search(t, v, m, r.toStd()); }
-
- struct sregex_iterator : std::sregex_iterator
- {
- sregex_iterator() {}
- template<class T, class V>
- sregex_iterator(T t, V v, const regex& r) : std::sregex_iterator(t, v, r.toStd()) {}
- };
-}
-# else
// <regex> in gcc is unusable in versions less than 4.9.0
// see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53631
# define LR_NS std
-namespace lyx {
-using LR_NS::regex;
-using LR_NS::regex_match;
-using LR_NS::regex_replace;
-using LR_NS::regex_search;
-using LR_NS::sregex_iterator;
-}
-# endif
#else
# include <boost/regex.hpp>
# define LR_NS boost
+#endif
+
namespace lyx {
using LR_NS::regex;
using LR_NS::regex_match;
using LR_NS::regex_replace;
using LR_NS::regex_search;
using LR_NS::sregex_iterator;
-}
-#endif
-
-namespace lyx {
using LR_NS::smatch;
using LR_NS::basic_regex;
using LR_NS::regex_error;