Le 22/11/2015 17:38, Georg Baum a écrit :
Guillaume Munch wrote:

Le 20/11/2015 19:43, Georg Baum a écrit :

It does. The reason is that the output of escape_special_chars() changes
when compiling with std::regex. The attached patch would fix the test for
std::regex, but then make check would fail when using boost::regex.

Of course, the source should be corrected, not the test.

Sure. This was to demonstrate the differences.

BTW I read the new documentation for tests and it did not help me at
all. It said nothing for "make check", and when I run "make check" it is
still a mystery to me where I can get any log file that says what went
wrong. (The file test_biblio.log is always empty.)

Yes, these tests are unfortunately not documented yet. Apart from that: The
autotools test machinery is too complicated. I do not know an equivalent for
make check, but you can run the equivalent of make alltests also with cmake
which I would recommend:

cd $BUILDTREE/src/test
make test

This shows that the regex engines we use do still interpret the
expressions in different ways, and I believe I also know the reason: The
expressions have changed to ECMAScript syntax, but the boost::regex
engine does not default to ECMAScript, but to boost perl style.

But "perl" for Boost is just a synonym for "ECMAScript", and the
difference between std and boost in the current master is only
supposed to be a few perl extensions added. (See

<http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/ref/syntax_option_type/syntax_option_type_perl.html>.)

Yes, I realized this now as well after reading the sources. I was lead on
the wrong track by the MSVC workaround in regex.h, and the fact that
boost::regex also has some undocumented features (like the one I got rid of
with dfe3a7d9fc57)

Therefore the problem is probably still in the expressions. What I
understand from your patch is that the output in std::regex is still
wrong: the regex adds \\ instead of \. The result of the test you showed
with the patch. I am worried by the line:

    return lyx::regex_replace(expr, reg, "\\\\$&");

This should be:

    return lyx::regex_replace(expr, reg, "\\$&");

but then this would no longer work with Boost, because it interprets the
backslash as an escape sequence... This appears to be a bug in Boost:

    format_default: Specifies that when a regular expression match is to
    be replaced by a new string, that the new string is constructed using
    the rules used by the ECMAScript replace function in ECMA-262,
    ECMAScript Language Specification, Chapter 15 part 5.4.11
    String.prototype.replace. (FWD.1).
    This is functionally identical to the Perl format string rules.

But the "Perl format" treats backslashes as a character one must escape,
which contradicts the preceding sentence.

The reference to ECMAScript may be wrong,

So the documentation is wrong.

but boost behaves as documented:
"Perl format string rules" in the original text at
http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/ref/match_flag_type.html
is a link pointing to
http://www.boost.org/doc/libs/master/libs/regex/doc/html/boost_regex/format/perl_format.html.

That page states that a backslash represents the literal character following
the backslash, unless the following character is listed as exception. If we
read the expression "\$&" (as seen by the regex engine, in C++ source it
would be "\\$&") from left to right we see a backslash, followed by a dollar
sign. The dollar sign is not listed as exception => \$ is read as a literal
$, and the back reference $& is not recognized.

BTW, references to Perl syntax in boost::regex do not mean 100% perl
compatibilty, but the boost::regex "perl" flavour, which is "based on that
used by the programming language Perl" (see
http://www.boost.org/doc/libs/1_57_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html).

std::regex behaves as documented as well: Escape sequences for backslashes
are not mentioned for the default format (see e.g.
http://www.cplusplus.com/reference/regex/regex_replace/ or
https://msdn.microsoft.com/en-us/library/bb982727.aspx), so backslashes are
always taken literally. I did some experiments with gcc 4.9 and MSVC
(compiler version: 19.00.23427.0(x86) using the online service at
http://webcompiler.cloudapp.net/ for the latter), and they support this
understanding.

Therefore, we simply have two regex implementations that treat the format
string of regex_replace() according to different rules. Fortunately, this
seems to be the only difference we need to care of.

On my side master still works fine with Boost so I don't see an inverted
situation. But, if we did the correct thing and changed the format
string to \$& as it should be, we would indeed end up in a symmetric
situation.

Yes, this was my misunderstanding that perl and ECMAScript would be
different. After much more reading and testing I see this now as well.

And my hope was that ECMAScript would not be identified with perl without having real compatibility...


I am afraid that for now there are only two unpleasant solutions:
* fix this particular replacement using a hack, or
* use Boost on every platform until we can get rid of it entirely.

I went through all other uses of regex_replace(), and after one fix all
other uses work with both implementations. Therefore, I would simply fix the
situation with conditional compilation. If we go back to boost, we will have
the same problems only later when we switch to C++11 in 2.3.

Sounds good.


It is now also clear that the MSVC workaround is not needed, so I deleted
it. Is the attached patch OK to go in?

Looks good (I trust that the small details are ok). I have the same understanding. Isn't the "FIXME: Unicode" trivial to fix btw?


Guillaume

Reply via email to