Re: [cmake-developers] Making your regular expression engine more reliable

2017-05-19 Thread Brad King
On 05/18/2017 03:44 PM, Alan W. Irwin wrote:
> I have just discovered a long-standing regular expression bug (see
> ) that has been
> around since at least 3.0.2.

Not to distract from the rest of the discussion in this thread, but that
particular issue has nothing to do with the regex engine.  It is purely
a problem with the way `string(REGEX REPLACE)` is implemented.  Maybe
a better regex engine could help fix the implementation, but this is
not due to a bug in the current regex engine.

-Brad

-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers


Re: [cmake-developers] Making your regular expression engine more reliable

2017-05-19 Thread Ben Boeckel
On Thu, May 18, 2017 at 12:44:57 -0700, Alan W. Irwin wrote:
> So your unit tests for regular expressions obviously missed at least
> this issue. I have no idea what those unit tests are (or even if they
> exist), but one possibility for attempting to wring most of the bugs out
> of your regular expression processor is to adapt some other project's
> regexp test suite. See
> 
> for a rather large list of such test suites.

This would be a great addition to CMake and would help with a future
replacement to ensure that we are compatible with what is being used
now.

> Another possibility is simply to forget supporting your own regexp
> engine and adopt someone else's very well regarded regexp engine (such
> as libprng).  I vaguely recall that has been suggested before, but
> since that hasn't happened I presume inertia or NIH syndrome won or
> else there was some strong reason why you didn't go that route.

This has been brought up before. The regex engine used in CMake is *old*
and, after my performance fixes a few years ago, is (or, at least, was)
near the top of that list for various reasons.

The biggest problem facing a replacement is the backwards compatibility.
Do we want to just use a standard set of features (C++14?) and break
unknown number of `MATCHES` expressions? This would require a policy and
the old code would still lurk so that we can support the OLD policy.

Or do we translate from what we support now into a standard language and
then use another engine for that? This is probably the better solution
for CMake, but is probably more code than the current engine.

> From my perspective as a strongly interested CMake user (but not a
> CMake developer or regexp guru) that wants a completely reliable
> regular expression engine for CMake, I don't care which of these two
> approaches you use to achieve that goal.  But I hope my starting
> this topic here will facilitate reaching that goal.

I understand the want for a better engine, but just replacing it
outright isn't really an option.

Thanks,

--Ben
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers


Re: [cmake-developers] Making your regular expression engine more reliable

2017-05-19 Thread Brad King
On 05/18/2017 03:44 PM, Alan W. Irwin wrote:
> Another possibility is simply to forget supporting your own regexp
> engine and adopt someone else's very well regarded regexp engine (such
> as libprng).  I vaguely recall that has been suggested before, but
> since that hasn't happened I presume inertia or NIH syndrome won or
> else there was some strong reason why you didn't go that route.

When CMake started there were very few BSD-licensed regex implementations.
The one we currently use was one of the few available at the time.  It is
not something we wrote ourselves.

The regex syntax is public-facing in CMake's interface, both in CMake
language code and on the command line.  Therefore any replacement will
have to interpret the input expressions exactly the same way as they are
interpreted now (except for ()-group limits and corner case bugs).

There was discussion in a thread in Nov 2011 about replacement strategies:

* https://cmake.org/pipermail/cmake-developers/2011-November/014249.html
* https://cmake.org/pipermail/cmake-developers/2011-November/014277.html
* https://cmake.org/pipermail/cmake-developers/2011-November/014291.html
* https://cmake.org/pipermail/cmake-developers/2011-November/014294.html
* https://cmake.org/pipermail/cmake-developers/2011-November/014353.html
* https://cmake.org/pipermail/cmake-developers/2011-November/014362.html
* https://cmake.org/pipermail/cmake-developers/2011-November/014376.html ***
* https://cmake.org/pipermail/cmake-developers/2011-November/014379.html

The one I marked `***` is the current state of the design space AFAIK.

I think the path forward is some combination of:

* Replace the implementation of the old regex syntax by transforming
  old expressions to use some new regex engine such that matching is
  identical.  This can be done regardless of whether the new engine's
  syntax is made public-facing (by the following steps).

* Use `(?#TRE)` or a similar prefix in expressions to explicitly request
  a new regex impl for them.  Then name "TRE" in this example corresponds
  to one particular regex library that was discussed in the above thread.
  Other names/libs could be used so long as they are BSD-licensed.
  Or, we could consider using the  in the C++ standard library,
  but that will have to wait until we finish modernizing CMake's
  requirements for C++ compilers.

* Use a policy to switch all regexes within CMake language code.

If anyone is interested in working on this please post.

-Brad

-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers


Re: [cmake-developers] Making your regular expression engine more reliable

2017-05-19 Thread Sebastian Holtermann


Am 18.05.2017 um 23:07 schrieb Domen Vrankar:
> 2017-05-18 21:44 GMT+02:00 Alan W. Irwin  >:
> 
> I have just discovered a long-standing regular expression bug (see
>  >) that has been
> around since at least 3.0.2.
> 
> So your unit tests for regular expressions obviously missed at least
> this issue. I have no idea what those unit tests are (or even if they
> exist), but one possibility for attempting to wring most of the bugs out
> of your regular expression processor is to adapt some other project's
> regexp test suite. See
> 
>  
> >
> for a rather large list of such test suites.
> 
> Another possibility is simply to forget supporting your own regexp
> engine and adopt someone else's very well regarded regexp engine (such
> as libprng).  I vaguely recall that has been suggested before, but
> since that hasn't happened I presume inertia or NIH syndrome won or
> else there was some strong reason why you didn't go that route.
> 
> 
> There's a third option that comes to mind - I remember that a while back
> there was talk about TR1 becoming a requirement for building CMake so
> TR1 regex library could be exposed (probably just |ECMAScript version).

+1

There are more limitations in the current regexp implementation.

1) It uses global variables that store only the result of
   the latest evaluation. This makes it impossible to access the
   matches of two or more cmsys::RegularExpression instances.

2) Because of the global variables cmsys::RegularExpression
   is not thread safe.

There are no threads used in CMake as far as I can tell
from a quick code search.
But there are some places in the AUTOGEN parts that could be
parallelized if regular expressions were thread safe (and threads
were available in CMake).

-Sebastian
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers


Re: [cmake-developers] Making your regular expression engine more reliable

2017-05-18 Thread Domen Vrankar
2017-05-18 21:44 GMT+02:00 Alan W. Irwin :

> I have just discovered a long-standing regular expression bug (see
> ) that has been
> around since at least 3.0.2.
>
> So your unit tests for regular expressions obviously missed at least
> this issue. I have no idea what those unit tests are (or even if they
> exist), but one possibility for attempting to wring most of the bugs out
> of your regular expression processor is to adapt some other project's
> regexp test suite. See
>  find-unit-tests-for-regular-expressions-in-multiple-languages>
> for a rather large list of such test suites.
>
> Another possibility is simply to forget supporting your own regexp
> engine and adopt someone else's very well regarded regexp engine (such
> as libprng).  I vaguely recall that has been suggested before, but
> since that hasn't happened I presume inertia or NIH syndrome won or
> else there was some strong reason why you didn't go that route.
>

There's a third option that comes to mind - I remember that a while back
there was talk about TR1 becoming a requirement for building CMake so TR1
regex library could be exposed (probably just ECMAScript version).

How far along is the TR1 idea to being implemented/accepted?

Regards,
Domen
-- 

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers

[cmake-developers] Making your regular expression engine more reliable

2017-05-18 Thread Alan W. Irwin

I have just discovered a long-standing regular expression bug (see
) that has been
around since at least 3.0.2.

So your unit tests for regular expressions obviously missed at least
this issue. I have no idea what those unit tests are (or even if they
exist), but one possibility for attempting to wring most of the bugs out
of your regular expression processor is to adapt some other project's
regexp test suite. See

for a rather large list of such test suites.

Another possibility is simply to forget supporting your own regexp
engine and adopt someone else's very well regarded regexp engine (such
as libprng).  I vaguely recall that has been suggested before, but
since that hasn't happened I presume inertia or NIH syndrome won or
else there was some strong reason why you didn't go that route.


From my perspective as a strongly interested CMake user (but not a

CMake developer or regexp guru) that wants a completely reliable
regular expression engine for CMake, I don't care which of these two
approaches you use to achieve that goal.  But I hope my starting
this topic here will facilitate reaching that goal.

Alan
__
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__

Linux-powered Science
__
--

Powered by www.kitware.com

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Kitware offers various services to support the CMake community. For more 
information on each offering, please visit:

CMake Support: http://cmake.org/cmake/help/support.html
CMake Consulting: http://cmake.org/cmake/help/consulting.html
CMake Training Courses: http://cmake.org/cmake/help/training.html

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/cmake-developers