Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-29 Thread Alexandru Ciobanu
Hi,

We found a workaround that does not require any source code modifications. I 
added the description to the bug report:
  http://public.kitware.com/Bug/view.php?id=12381#c27872

In short, we filter out the longer lines in the build output, so that CTest 
does not spend hours regex-matching them.

In a way, this is like the short circuit stuff that Bill Hoffman suggested 
earlier in this thread. 

On 2011-11-16, at 3:50 PM, Bill Hoffman wrote:

 A little off topic, but I am wondering if the ctest performance issue for 
 xcode could be fixed without changing the regex.  The problem with xcode is 
 that it spits out very verbose output.  I am wondering if some short circuit 
 stuff could be put in place.  Maybe do a string compare of the first bit of 
 every line that look for stuff that could not have an error in it, and only 
 if it might have an error, do we pass it to the regex call. Basically, if we 
 could reduce the amount of data going into the regex stuff, it should work as 
 well as it does for other compilers.  The place that this code could go is 
 into cmakexbuild.cxx which already strips out all lines that start with 
 setenv.  Maybe even some hard coded stuff that looks for errors and only puts 
 those out.  i.e. only output lines from cmakexbuild.cxx if there are errors.


Since it is similar to the way that cmakexbuild.cxx filters out setenv lines, 
it would be possible to add this additional filter. The only drawback would be 
that the compiler invocation lines will not be in the logs anymore (since these 
are the longest lines that we eliminate).

If wanted, I can submit the small patch with the modifications to 
cmakexbuild.cxx.

sincerely,
Alex Ciobanu 
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-29 Thread Bill Hoffman

On 11/29/2011 2:41 PM, Alexandru Ciobanu wrote:

Hi,

We found a workaround that does not require any source code
modifications. I added the description to the bug report:
http://public.kitware.com/Bug/view.php?id=12381#c27872


Good, glad you are up and running again.
...

Since it is similar to the way that cmakexbuild.cxx filters out
setenv lines, it would be possible to add this additional filter.
The only drawback would be that the compiler invocation lines will
not be in the logs anymore (since these are the longest lines that we
eliminate).

If wanted, I can submit the small patch with the modifications to
cmakexbuild.cxx.


I am wondering if we might want to make this an option to cmakexbuild? 
cmakexbuild -skip-lines=/Developer/usr/bin which could be something 
you would put in your build command:


  SET (CTEST_BUILD_COMMAND cmakexbuild -skip-lines=/Developer/usr/bin)

Then, if you are seeing significant performance issues, you could add this.


-Bill
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-29 Thread James Bigler
On Tue, Nov 29, 2011 at 1:05 PM, Bill Hoffman bill.hoff...@kitware.comwrote:

 On 11/29/2011 2:41 PM, Alexandru Ciobanu wrote:

 Hi,

 We found a workaround that does not require any source code
 modifications. I added the description to the bug report:
 http://public.kitware.com/Bug/**view.php?id=12381#c27872http://public.kitware.com/Bug/view.php?id=12381#c27872

  Good, glad you are up and running again.
 ...



Does this mean we won't see any new and better regexp code in CMake?  I for
one was looking forward to a better regexp engine.

James
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-29 Thread David Cole
On Tue, Nov 29, 2011 at 3:44 PM, James Bigler jamesbig...@gmail.com wrote:
 On Tue, Nov 29, 2011 at 1:05 PM, Bill Hoffman bill.hoff...@kitware.com
 wrote:

 On 11/29/2011 2:41 PM, Alexandru Ciobanu wrote:

 Hi,

 We found a workaround that does not require any source code
 modifications. I added the description to the bug report:
 http://public.kitware.com/Bug/view.php?id=12381#c27872

 Good, glad you are up and running again.
 ...


 Does this mean we won't see any new and better regexp code in CMake?  I for
 one was looking forward to a better regexp engine.

 James


 --

 Powered by www.kitware.com

 Visit other Kitware open-source projects at
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the CMake FAQ at:
 http://www.cmake.org/Wiki/CMake_FAQ

 Follow this link to subscribe/unsubscribe:
 http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


I think we're going to take it slow and make it well-thought-out
before we put a new regex engine in... It will not be appearing in the
upcoming 2.8.7 since nothing is even in 'next' yet, and we are still
discussing the options.

We'll keep sleeping on it till it seems cooked enough before we put
something in 'next' to try out.

Thx,
David
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-24 Thread Alexander Neundorf
On Wednesday 23 November 2011, David Cole wrote:
 On Wed, Nov 23, 2011 at 2:09 PM, David Cole david.c...@kitware.com wrote:
  On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com 
wrote:
  On 11/23/2011 12:51 PM, Brad King wrote:
  On 11/23/2011 12:48 PM, Brad King wrote:
  On 11/23/2011 12:43 PM, Brad King wrote:
  On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:
  The regex in question is:
  ^[^][:/*?]+\$
  
   To include a literal ] in the list, make it either the first item
  
  It must be the [: in this regex that TRE sees as special since it
  allows expressions like [:digit:] inside a bracket expression.
  
  Still, this is a case that my proposed policy would pick up.
  
  -Brad
  
  I am still very wary about this policy.  For 99% of folks the current
  regex is just fine.  Making them eventually change to get the new
  regex is making them do work that they don't need or want.  I would
  rather have two API's.   I just don't see the big upside of TRE, and I
  see this causing pain for lots and lots of folks if we push them to
  make the change.  CMake has most likely 100,000 or more users at this
  point.  A change like this could easily inflict a man years of effort
  onto the world, and should not be taken lightly.
  
  -Bill
  --
  
  Powered by www.kitware.com
  
  Visit other Kitware open-source projects at
  http://www.kitware.com/opensource/opensource.html
  
  Please keep messages on-topic and check the CMake FAQ at:
  http://www.cmake.org/Wiki/CMake_FAQ
  
  Follow this link to subscribe/unsubscribe:
  http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
  
  Big upside:(quoting from Alexandru Ciobanu's email of Nov. 17th
  earlier in this thread)
  
  The impact on the build time is pretty dramatic:
  CMake: 7h39m
  CMake + TRE: 1h06m
 
 And although there is a big upside, we do still have to be careful.
 
 We have to remember that regexes are used in the context of ctest -D
 invocations, ctest -S script running and cmake -P running, too, where
 policies are not really a reliable mechanism. So in addition to having
 a careful policy, we also have to decide what to do in those cases.
 The case that is in question here for the big performance gain is
 ctest running and filtering build output based on regexes. No cmake
 policy mechanism in sight for that scenario.

Also, AFAIK supports more stuff than the current regexps. This is good.
But doesn't that mean that potentially there could be regexps existing which 
right now don't have a special meaning, but with TRE they suddenly get a 
special meaning and change the matching  ?

Does e.g. the following produce the same result ? I get : as match here.

set(text hello world [:digit:])
if (${text} MATCHES .+([:digit:]))
  message(STATUS match: \${CMAKE_MATCH_1}\)
endif()

Alex
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Bill Hoffman

On 11/22/2011 4:39 PM, Brad King wrote:


It is tempting to always require explicit requests for new TRE behavior,
such as using TRE instead of REGEX in keyword locations, but one
advantage of using a policy is that over time the old behavior will
disappear completely from usage.

I am pretty sure the last time we talked about adding a new regex we 
talked about requiring explicit requests.  I think this would be a much 
safer approach.  I am really scared that this regex will not be 
compatible with the old one, and it will break lots of stuff in very 
subtle ways that are hard for people to detect.  It is not that much 
code to have both.  Where performance is an issue, we can swap it out, 
and when people need better regex they can use TRE as well.  I don't 
think the pain will be worth getting rid of the old usage.


-Bill

--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread James Bigler
On Wed, Nov 23, 2011 at 8:36 AM, Bill Hoffman bill.hoff...@kitware.comwrote:

 On 11/22/2011 4:39 PM, Brad King wrote:

  It is tempting to always require explicit requests for new TRE behavior,
 such as using TRE instead of REGEX in keyword locations, but one
 advantage of using a policy is that over time the old behavior will
 disappear completely from usage.

  I am pretty sure the last time we talked about adding a new regex we
 talked about requiring explicit requests.  I think this would be a much
 safer approach.  I am really scared that this regex will not be compatible
 with the old one, and it will break lots of stuff in very subtle ways that
 are hard for people to detect.  It is not that much code to have both.
  Where performance is an issue, we can swap it out, and when people need
 better regex they can use TRE as well.  I don't think the pain will be
 worth getting rid of the old usage.

 -Bill


 --

 Powered by www.kitware.com

 Visit other Kitware open-source projects at http://www.kitware.com/**
 opensource/opensource.htmlhttp://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the CMake FAQ at:
 http://www.cmake.org/Wiki/**CMake_FAQhttp://www.cmake.org/Wiki/CMake_FAQ

 Follow this link to subscribe/unsubscribe:
 http://public.kitware.com/cgi-**bin/mailman/listinfo/cmake-**developershttp://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Why can't this be solved with a policy?  One problem of using an explicit
TRE command is that if you want to write code that *could* be used in an
older version of CMake you won't be able to use it.

I agree that making the usage explicit would allow for complete backward
compatibility, but it clutters the language.  Do you want to have two
versions of every regular expression syntax?  GLOB vs GLOB_TRE? MATCHES vs
MATCHES_TRE?

Another argument against using TRE explicitly is these words tell me
nothing about what that function does unless I'm extremely familiar with
the intricacies of CMake script.  I think we want to make CMake easier to
use, not harder.

James
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Bill Hoffman

On 11/23/2011 11:43 AM, James Bigler wrote:



Why can't this be solved with a policy?  One problem of using an
explicit TRE command is that if you want to write code that *could* be
used in an older version of CMake you won't be able to use it.

It could be, but that will not come without pain.  I can see a future 
were this policy kept off in many projects.



I agree that making the usage explicit would allow for complete backward
compatibility, but it clutters the language.  Do you want to have two
versions of evegry regular expression syntax?  GLOB vs GLOB_TRE? MATCHES
vs MATCHES_TRE?

Another argument against using TRE explicitly is these words tell me
nothing about what that function does unless I'm extremely familiar with
the intricacies of CMake script.  I think we want to make CMake easier
to use, not harder.


For many folks the current regex is just fine, so they will not have to 
do anything.  Only the people that have hit some sort of wall with the 
regex will need to change.  If we have a policy, I can see projects 
changing the minimum required version and having a subtle bug show up 
because the regex changed.



I don' think we even know what all the differences are at this point...

-Bill
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Brad King
On 11/23/2011 12:06 PM, Bill Hoffman wrote:
 On 11/23/2011 11:43 AM, James Bigler wrote:
 Why can't this be solved with a policy?  One problem of using an
 explicit TRE command is that if you want to write code that *could* be
 used in an older version of CMake you won't be able to use it.

 It could be, but that will not come without pain.  I can see a future 
 were this policy kept off in many projects.

We can wrap both the TRE and KWSys regex implementations in a
cmRegularExpression that provides the same interface for both.
Then we can implement the policy at that level.  If the policy is
set to OLD, use the KWSys implementation.  If the policy is set
to NEW, use the TRE implementation.  If the policy is not set, then
do *both* implementations and compare the results of matching.  If
they are different warn about the policy not being set and favor
the KWSys result for compatibility.

 I don't think we even know what all the differences are at this point...

The above approach would reveal differences without breaking projects.
Alex C. should produce some unit tests for the existing behavior
which we should have had all along.  Even though we probably cannot
think of everything every project is doing at least we will know up
front if something major is different.

Policies are meant for use when the new behavior is clearly better
than the old behavior in every way and should have been the way it
was always done had it been available from the beginning.  This is
such a case.  We are not switching between two desirable behaviors.
The only argument in favor of the old behavior is that is how it
works now.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Alexandru Ciobanu
Hi Bill,

On 2011-11-23, at 10:36 AM, Bill Hoffman wrote:

 I am pretty sure the last time we talked about adding a new regex we talked 
 about requiring explicit requests.  I think this would be a much safer 
 approach.  I am really scared that this regex will not be compatible with the 
 old one, and it will break lots of stuff in very subtle ways that are hard 
 for people to detect.  It is not that much code to have both.  Where 
 performance is an issue, we can swap it out, and when people need better 
 regex they can use TRE as well.  I don't think the pain will be worth getting 
 rid of the old usage.

I agree with you on multiple points here.

[1]
Your intuition is right. Just minutes before reading your email I tried to 
compile ITK using CMake+TRE. And there was at least one regex that TRE refused 
to compile. 

So yes, we cannot use TRE for *all* regex needs in CMake.

[2]
  It is not that much code to have both. 
I agree, and I think Brad King does too.

The patch I submitted does exactly that:
- old regex code stays unchanged 
  --- Source/kwsys/RegularExpression.{cxx,hxx.in} 
- new TRE regex code added
  --- Utilities/cmtre

So, yes, we intend to keep both.

[3]
I think there's a simple solution to our problem. I think we can:
  - solve the performance problem
  - keep the old behaviour, i.e. not break any projects out there

The solution is to use the old regex code everywhere, except for the very 
specific place where it causes problems.

By looking at the bug report for 12381 
(http://public.kitware.com/Bug/view.php?id=12381), you can see that the only 
place we have to use TRE is:
   Source/CTest/cmCTestBuildHandler.cxx

What cmCTestBuildHandler does is very simple: for every single line of output 
from the build process it tries to match about 100 regular expressions, in 
order to find error or warning messages. These 100 regular expressions are 
defined in the same file in four static arrays, that look something like this:

static const char* cmCTestErrorMatches[] = { 
  ^[Bb]us [Ee]rror, 
  ^[Ss]egmentation [Vv]iolation, 
  ^[Ss]egmentation [Ff]ault, 
  :.*[Pp]ermission [Dd]enied, 
  ([^ :]+):([0-9]+): ([^ \\t]), 
  ([^:]+): error[ \\t]*[0-9]+[ \\t]*:, 
  // AND SO ON …
   };
  
  // + other 3 arrays

In this case, it is safe to use TRE, because we (the CMake developers) write 
these regular expressions, and we can make sure they work with TRE.

All other regular expressions, including those written by users in their 
CMakeList.txt files will run on the old regex code, and thus behave normally.

CONCLUSION:
  - we can keep old behaviour and solve the performance problem
  - solution in part [3]

If this solution is acceptable, I'll have to recreate the patch.

sincerely,
Alex Ciobanu--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Brad King
On 11/23/2011 12:20 PM, Alexandru Ciobanu wrote:
 to compile ITK using CMake+TRE. And there was at least one regex that
 TRE refused to compile. 

What was it, and where in the ITK code is it?

Thanks,
-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Alexandru Ciobanu

On 2011-11-23, at 12:24 PM, Brad King wrote:

 On 11/23/2011 12:20 PM, Alexandru Ciobanu wrote:
 to compile ITK using CMake+TRE. And there was at least one regex that
 TRE refused to compile. 
 
 What was it, and where in the ITK code is it?

The regex in question is:
^[^][:/*?]+\$

And it appears at this location in the ITK source tree:
CMake/ExternalData.cmake:347

And the expression is correct, because you're allowed to have the ] 
metacharacter inside a [^xyz] class if it comes immediately after ^.

TRE does not do it the same way, see 
(http://laurikari.net/tre/documentation/regex-syntax/ the Bracket expressions 
section):

 To include a literal ] in the list, make it either the first item, the second 
 endpoint of a range, or enclose it in [. and.].

sincerely,
Alex--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Brad King
On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:
 The regex in question is:
 ^[^][:/*?]+\$
 
 And it appears at this location in the ITK source tree:
 CMake/ExternalData.cmake:347
 
 And the expression is correct, because you're allowed to have the ]
 metacharacter inside a [^xyz] class if it comes immediately after ^.

Ironically I was the one that wrote that regex ;)

 TRE does not do it the same way, see
 (http://laurikari.net/tre/documentation/regex-syntax/ the Bracket
 expressions section):

Interesting, thanks.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Brad King
On 11/23/2011 12:43 PM, Brad King wrote:
 On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:
 The regex in question is:
 ^[^][:/*?]+\$
 
 And it appears at this location in the ITK source tree:
 CMake/ExternalData.cmake:347
 
 And the expression is correct, because you're allowed to have the ]
 metacharacter inside a [^xyz] class if it comes immediately after ^.
 
 Ironically I was the one that wrote that regex ;)
 
 TRE does not do it the same way, see
 (http://laurikari.net/tre/documentation/regex-syntax/ the Bracket
 expressions section):

Wait, that documentation does say the same thing:

 bracket-expression ::= [ item+ ]
|   [^ item+ ]

 To include a literal ] in the list, make it either the first item

That's exactly what this regex does.  It uses the second production
rule in the above grammar fragment and puts the ']' first after '^'.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Brad King
On 11/23/2011 12:48 PM, Brad King wrote:
 On 11/23/2011 12:43 PM, Brad King wrote:
 On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:
 The regex in question is:
 ^[^][:/*?]+\$
 
  To include a literal ] in the list, make it either the first item

It must be the [: in this regex that TRE sees as special since it
allows expressions like [:digit:] inside a bracket expression.

Still, this is a case that my proposed policy would pick up.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Bill Hoffman

On 11/23/2011 12:51 PM, Brad King wrote:

On 11/23/2011 12:48 PM, Brad King wrote:

On 11/23/2011 12:43 PM, Brad King wrote:

On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:

The regex in question is:
 ^[^][:/*?]+\$


  To include a literal ] in the list, make it either the first item


It must be the [: in this regex that TRE sees as special since it
allows expressions like [:digit:] inside a bracket expression.

Still, this is a case that my proposed policy would pick up.

-Brad

I am still very wary about this policy.  For 99% of folks the current 
regex is just fine.  Making them eventually change to get the new 
regex is making them do work that they don't need or want.  I would 
rather have two API's.   I just don't see the big upside of TRE, and I 
see this causing pain for lots and lots of folks if we push them to make 
the change.  CMake has most likely 100,000 or more users at this point. 
 A change like this could easily inflict a man years of effort onto the 
world, and should not be taken lightly.


-Bill
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread David Cole
On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com wrote:
 On 11/23/2011 12:51 PM, Brad King wrote:

 On 11/23/2011 12:48 PM, Brad King wrote:

 On 11/23/2011 12:43 PM, Brad King wrote:

 On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:

 The regex in question is:
     ^[^][:/*?]+\$

  To include a literal ] in the list, make it either the first item

 It must be the [: in this regex that TRE sees as special since it
 allows expressions like [:digit:] inside a bracket expression.

 Still, this is a case that my proposed policy would pick up.

 -Brad

 I am still very wary about this policy.  For 99% of folks the current regex
 is just fine.  Making them eventually change to get the new regex is
 making them do work that they don't need or want.  I would rather have two
 API's.   I just don't see the big upside of TRE, and I see this causing pain
 for lots and lots of folks if we push them to make the change.  CMake has
 most likely 100,000 or more users at this point.  A change like this could
 easily inflict a man years of effort onto the world, and should not be taken
 lightly.

 -Bill
 --

 Powered by www.kitware.com

 Visit other Kitware open-source projects at
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the CMake FAQ at:
 http://www.cmake.org/Wiki/CMake_FAQ

 Follow this link to subscribe/unsubscribe:
 http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Big upside:(quoting from Alexandru Ciobanu's email of Nov. 17th
earlier in this thread)

The impact on the build time is pretty dramatic:
 CMake: 7h39m
 CMake + TRE: 1h06m
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Marcus D. Hanwell
On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com wrote:
 On 11/23/2011 12:51 PM, Brad King wrote:

 On 11/23/2011 12:48 PM, Brad King wrote:

 On 11/23/2011 12:43 PM, Brad King wrote:

 On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:

 The regex in question is:
     ^[^][:/*?]+\$

  To include a literal ] in the list, make it either the first item

 It must be the [: in this regex that TRE sees as special since it
 allows expressions like [:digit:] inside a bracket expression.

 Still, this is a case that my proposed policy would pick up.

 -Brad

 I am still very wary about this policy.  For 99% of folks the current regex
 is just fine.  Making them eventually change to get the new regex is
 making them do work that they don't need or want.  I would rather have two
 API's.   I just don't see the big upside of TRE, and I see this causing pain
 for lots and lots of folks if we push them to make the change.  CMake has
 most likely 100,000 or more users at this point.  A change like this could
 easily inflict a man years of effort onto the world, and should not be taken
 lightly.

Couldn't they defer by setting the policy to OLD? If they bump the
minimum version the user knows that backward incompatible changes may
be introduced. If there was a way to verify that the output of the
regex were the same with both implementations too, that should reduce
the possibility of subtle bugs.

Marcus
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread David Cole
On Wed, Nov 23, 2011 at 2:09 PM, David Cole david.c...@kitware.com wrote:
 On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com 
 wrote:
 On 11/23/2011 12:51 PM, Brad King wrote:

 On 11/23/2011 12:48 PM, Brad King wrote:

 On 11/23/2011 12:43 PM, Brad King wrote:

 On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote:

 The regex in question is:
     ^[^][:/*?]+\$

  To include a literal ] in the list, make it either the first item

 It must be the [: in this regex that TRE sees as special since it
 allows expressions like [:digit:] inside a bracket expression.

 Still, this is a case that my proposed policy would pick up.

 -Brad

 I am still very wary about this policy.  For 99% of folks the current regex
 is just fine.  Making them eventually change to get the new regex is
 making them do work that they don't need or want.  I would rather have two
 API's.   I just don't see the big upside of TRE, and I see this causing pain
 for lots and lots of folks if we push them to make the change.  CMake has
 most likely 100,000 or more users at this point.  A change like this could
 easily inflict a man years of effort onto the world, and should not be taken
 lightly.

 -Bill
 --

 Powered by www.kitware.com

 Visit other Kitware open-source projects at
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the CMake FAQ at:
 http://www.cmake.org/Wiki/CMake_FAQ

 Follow this link to subscribe/unsubscribe:
 http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


 Big upside:    (quoting from Alexandru Ciobanu's email of Nov. 17th
 earlier in this thread)

 The impact on the build time is pretty dramatic:
     CMake: 7h39m
     CMake + TRE: 1h06m


And although there is a big upside, we do still have to be careful.

We have to remember that regexes are used in the context of ctest -D
invocations, ctest -S script running and cmake -P running, too, where
policies are not really a reliable mechanism. So in addition to having
a careful policy, we also have to decide what to do in those cases.
The case that is in question here for the big performance gain is
ctest running and filtering build output based on regexes. No cmake
policy mechanism in sight for that scenario.
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Bill Hoffman
On Wed, Nov 23, 2011 at 3:24 PM, Sean McBride s...@rogue-research.com wrote:
 On Wed, 23 Nov 2011 14:03:20 -0500, Bill Hoffman said:

For 99% of folks the current regex is just fine.

 AFAICT, this performance bug affects 100% of Xcode generator users.  Even 
 looking at CMake's dashboard, you can see the difference, just search it for 
 'xcode'.  ex:


100% of Xcode users that use ctest to build.  I would still put that
in the 1% of those that build with CMake. A build with CMake does not
go via ctest, but is built on the command line or in the IDE Xcode
which will not have the regex slow down.

Changing just ctest somehow is a much smaller scope than changing
every regex in all of CMake.  I stand by my 99% are OK with the regex
that they have.  :)

-Bill
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Brad King
On 11/23/2011 5:43 PM, Brad King wrote:
 On 11/23/2011 12:44 PM, Brad King wrote:
 However, the above does not need to stand in the way of solving the
 problem you're addressing.  We can simply set that goal aside for
 now by not exposing TRE in the CMake language anywhere.  Use it
 just for cmCTestBuildHandler.
 
 but people kept going on the above part of the debate ;)

After some more thought, I've realized that no approach currently
proposed is practical:

- cmCTestBuildHandler can use a list of custom regular expressions
  so we cannot assume all of them will be compatible with TRE

- As David Cole pointed out there are many places, like CTest's
  -R and -E options, that use regular expressions in contexts
  where we cannot possibly use a policy.  Any attempt to do so in
  such places would just turn into a second API to set the policy
  in the local context of the regex.

- If we add a second API like MATCHES = MATCHES_TRE then we would
  eventually need to do that in *every* place that offers regex
  matching.  That would mean alternatives to the above -R and -E
  options and a lot more.

- People could write code that passes a regex around in a variable.
  This would hide from the author of the regex the context in which
  it will be used, so it is unknown whether it is TRE or traditional.

I propose we go back to an approach discussed the first time PCRE
was proposed.  The indication of the type of regex must be in the
regex itself.  IIRC the proposal was something like

  REGEX:...# old
  PCRE:... # PCRE

Of course that is ambiguous too because the prefixes are valid
expressions.  Instead we can use a prefix that is not otherwise
a valid expression.  We can use an idea from Python:

  http://docs.python.org/library/re.html

that defines expressions of the form (?...) which are not otherwise
valid.  In order to avoid conflict with future use of the constructs
they define, we can use the comment form Python defines:

 (?#OLD)...   # old
 (?#TRE)...   # TRE

This is quite easy to implement.  Just take the currently proposed
patch that replaces use of cmsys::RegularExpression with the new
cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression).
Inside the wrapper look for a leading comment of the above form to
decide which regex impl to use internally.  Then strip off the prefix
and pass the rest of the regex to the underlying implementation.
Once this is done update all the default warning and error regular
expressions that CTest uses.  Add the (?#TRE) prefix to them.

This approach will solve the speed problem, give people access to the
TRE extended features when they want it anywhere CMake already uses
a regex, has no compatibility problems, is a very narrow second
interface, and is extensible for future optional regex behavior.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-23 Thread Michael Wild
On 11/24/2011 12:34 AM, Brad King wrote:
 On 11/23/2011 5:43 PM, Brad King wrote:
 On 11/23/2011 12:44 PM, Brad King wrote:
 However, the above does not need to stand in the way of solving the
 problem you're addressing.  We can simply set that goal aside for
 now by not exposing TRE in the CMake language anywhere.  Use it
 just for cmCTestBuildHandler.

 but people kept going on the above part of the debate ;)
 
 After some more thought, I've realized that no approach currently
 proposed is practical:
 
 - cmCTestBuildHandler can use a list of custom regular expressions
   so we cannot assume all of them will be compatible with TRE
 
 - As David Cole pointed out there are many places, like CTest's
   -R and -E options, that use regular expressions in contexts
   where we cannot possibly use a policy.  Any attempt to do so in
   such places would just turn into a second API to set the policy
   in the local context of the regex.
 
 - If we add a second API like MATCHES = MATCHES_TRE then we would
   eventually need to do that in *every* place that offers regex
   matching.  That would mean alternatives to the above -R and -E
   options and a lot more.
 
 - People could write code that passes a regex around in a variable.
   This would hide from the author of the regex the context in which
   it will be used, so it is unknown whether it is TRE or traditional.
 
 I propose we go back to an approach discussed the first time PCRE
 was proposed.  The indication of the type of regex must be in the
 regex itself.  IIRC the proposal was something like
 
   REGEX:...# old
   PCRE:... # PCRE
 
 Of course that is ambiguous too because the prefixes are valid
 expressions.  Instead we can use a prefix that is not otherwise
 a valid expression.  We can use an idea from Python:
 
   http://docs.python.org/library/re.html
 
 that defines expressions of the form (?...) which are not otherwise
 valid.  In order to avoid conflict with future use of the constructs
 they define, we can use the comment form Python defines:
 
  (?#OLD)...   # old
  (?#TRE)...   # TRE
 
 This is quite easy to implement.  Just take the currently proposed
 patch that replaces use of cmsys::RegularExpression with the new
 cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression).
 Inside the wrapper look for a leading comment of the above form to
 decide which regex impl to use internally.  Then strip off the prefix
 and pass the rest of the regex to the underlying implementation.
 Once this is done update all the default warning and error regular
 expressions that CTest uses.  Add the (?#TRE) prefix to them.
 
 This approach will solve the speed problem, give people access to the
 TRE extended features when they want it anywhere CMake already uses
 a regex, has no compatibility problems, is a very narrow second
 interface, and is extensible for future optional regex behavior.
 
 -Brad

I like that proposal a lot, although I'm afraid it is a bit verbose.
Some of my regexes are already pretty lengthy, pushing the 80-columns limit.

Michael
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-22 Thread Alexandru Ciobanu

On 2011-11-17, at 3:59 PM, Brad King wrote:

 On 11/17/2011 3:19 PM, Alexandru Ciobanu wrote:
  I was able to make CMake use TRE, by changing the
  RegularExpression.{cxx,hxx.in} files.
 
 Those are down in Source/kwsys which is a directory shared by
 projects other than just CMake.  We cannot touch the files there.
 Instead you will need to re-factor things to go through a wrapper.
 The first stage will just wrap up the KWSys regular expression API.
 The second stage will replace the implementation with TRE.
 
 - Does anyone see a problem if we add TRE in CMake the same
  way as ZLIB, CURL, etc? (i.e. in ./Utilities/)
 
 That should be fine.
 
 -Brad


Hi,

As Brad King suggested, instead of changing the files in Source/kwsys/, I 
created a wrapper class and made all the calls go through it.

I also added the TRE library to Utilities/cmtre, and added CMAKE_USE_SYSTEM_TRE.

I added the patch to the bug tracker:
 http://public.kitware.com/Bug/view.php?id=12381

Needless to say, the patch fixes the performance problem.

To keep things simple I omitted several things:
   - TRE library bootstrapping (so now the Bootstrap test will fail)
   - the suggested policy to enable/disable approximate matching in TRE 
   - proper checks when building TRE with CMake as done in its ./configure.ac 

Before I spend more time on that I would like to get some feedback, and namely:
   - Is the approach correct?
   - What next?

sincerely,
Alex Ciobanu--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-22 Thread Brad King

On 11/22/2011 1:50 PM, Alexandru Ciobanu wrote:

As Brad King suggested, instead of changing the files in Source/kwsys/,

 I created a wrapper class and made all the calls go through it.

Thanks.


I also added the TRE library to Utilities/cmtre, and added CMAKE_USE_SYSTEM_TRE.

I added the patch to the bug tracker:
http://public.kitware.com/Bug/view.php?id=12381


Please add a note there indicating the CMake version (git commit sha1)
on which the patch was based.  Otherwise I cannot apply it cleanly.


Needless to say, the patch fixes the performance problem.


Great!


To keep things simple I omitted several things:
- TRE library bootstrapping (so now the Bootstrap test will fail)


The KWSys implementation will not be going away, so we can fall back to
that one during bootstrapping.


- the suggested policy to enable/disable approximate matching in TRE


Please read up on policies to make sure you understand them:

  http://www.cmake.org/Wiki/CMake/Policies
  http://www.cmake.org/cmake/help/cmake-2-8-docs.html#command:cmake_policy
  http://www.cmake.org/cmake/help/cmake-2-8-docs.html#section_Policies

We will need a policy to know how to treat a regex containing one of
the characters that behaves differently in TRE.  The OLD behavior of the
policy will escape them to get the old matching behavior.  The NEW behavior
of the policy will use the new matching features.

We also need to identify the contexts that offer regex matching but have
no way to set the policy.  For those we need to decide if we can simply
use the new behavior outright or provide another way to switch it.

It is tempting to always require explicit requests for new TRE behavior,
such as using TRE instead of REGEX in keyword locations, but one
advantage of using a policy is that over time the old behavior will
disappear completely from usage.


- proper checks when building TRE with CMake as done in its ./configure.ac


IOW, porting TRE to build properly with CMake, right?


- Is the approach correct?


Yes.  I will review the patch in more detail next week and after I know
where to apply it.


- What next?


We need to establish a transition plan, which mostly consists of the above
policy discussion.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-22 Thread Alexandru Ciobanu
 
 I also added the TRE library to Utilities/cmtre, and added 
 CMAKE_USE_SYSTEM_TRE.
 
 I added the patch to the bug tracker:
 http://public.kitware.com/Bug/view.php?id=12381
 
 Please add a note there indicating the CMake version (git commit sha1)
 on which the patch was based.  Otherwise I cannot apply it cleanly.
 

The commit that the patch is based on is:
5675ec5e493e01e10d9ad8d8b60eac62033f31c2

I added a note to the bug tracker.


 To keep things simple I omitted several things:
 - TRE library bootstrapping (so now the Bootstrap test will fail)
 
 The KWSys implementation will not be going away, so we can fall back to
 that one during bootstrapping.

This is a good idea.


 - proper checks when building TRE with CMake as done in its ./configure.ac
 
 IOW, porting TRE to build properly with CMake, right?

Yes, there are some checks, find headers, find types, etc. But all these 
operations have equivalents in CMake. So it should be straightforward.


sincerely,
Alex Ciobanu--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-17 Thread Alexandru Ciobanu
Hi everyone,[ CMake + TRE ]I was able to make CMake use TRE, by changing the RegularExpression.{cxx,hxx.in} files.I ran the CMake tests, and 100% pass. See the attached log file.(NOTE: Bootstrap, complex, complexOne were initially not aware of TRE dependency, but I fixed that easily).[ Impact of using CMake + TRE on our builds ]We picked one of our build machines and replaced the ctest binary on it.The impact on the build time is pretty dramatic:  CMake:7h39m  CMake + TRE:			1h06m   This is a machine that has two cores.On machines that have more cores, the impact is even greater. On my 8 core machine, running a particular build task:  CMake:19m57s  CMake + TRE:			1m30s  [ Regular expressions syntax ]In terms of regular expressions syntax, the only difference that I've seen is that TRE treats the curly brackets "{" and "}" as special characters, because it uses them for its "approximate matching". Details here:  http://laurikari.net/tre/documentation/regex-syntax/The only CMake component that uses curly brackets in a regexp is:  Modules/FindJNI.cmakebut it was trivial to fix because they were used as mere delimiters.As mentioned earlier, after this change 100% of the tests pass.[ Implications ]Note that CTast is *not* the only component that would benefit from faster regular expressions.I've found at least one other reported case when regular _expression_ were too slow in CMake: http://public.kitware.com/Bug/print_bug_page.php?bug_id=5537Since Glob uses RegularExpression, I would not be surprised if CMake+TRE will be faster on large code bases.CONCLUSION: - TRE is fast, benefits build times immenselyQUESTION: - Does anyone see a problem if we add TRE in CMake the same way as ZLIB, CURL, etc? (i.e. in ./Utilities/)sincerely,Alex Ciobanu

time.ctest.alex.log
Description: Binary data
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-17 Thread Alexander Neundorf
On Thursday 17 November 2011, Alexandru Ciobanu wrote:
 Hi everyone,
 
 [ CMake + TRE ]
 I was able to make CMake use TRE, by changing the
 RegularExpression.{cxx,hxx.in} files.
 
 I ran the CMake tests, and 100% pass. See the attached log file.
 (NOTE: Bootstrap, complex, complexOne were initially not aware of TRE
 dependency, but I fixed that easily).

Cool :-)

 [ Impact of using CMake + TRE on our builds ]
 We picked one of our build machines and replaced the ctest binary on it.
 The impact on the build time is pretty dramatic:
  CMake:   7h39m
  CMake + TRE: 1h06m
 
 This is a machine that has two cores.
 
 On machines that have more cores, the impact is even greater.  On my 8 core
 machine, running a particular build task: CMake:  
 19m57s
  CMake + TRE:  1m30s
 
 
 [ Regular expressions syntax ]
 In terms of regular expressions syntax, the only difference that I've seen
 is that TRE treats the curly brackets { and } as special characters,
 because it uses them for its approximate matching.  Details here:
 http://laurikari.net/tre/documentation/regex-syntax/
 
 The only CMake component that uses curly brackets in a regexp is:
 Modules/FindJNI.cmake
 but it was trivial to fix because they were used as mere delimiters.

Well, but there are cmake files out there (i.e. all existing cmake-based 
projects) which also must behave basically exactly the same as before, 
otherwise their builds might break.

Not sure how to achieve this.
A policy ?

Alex
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-17 Thread Alexandru Ciobanu

On 2011-11-17, at 3:26 PM, Alexander Neundorf wrote:

 [ Regular expressions syntax ]
 In terms of regular expressions syntax, the only difference that I've seen
 is that TRE treats the curly brackets { and } as special characters,
 because it uses them for its approximate matching.  Details here:
 http://laurikari.net/tre/documentation/regex-syntax/
 
 The only CMake component that uses curly brackets in a regexp is:
Modules/FindJNI.cmake
 but it was trivial to fix because they were used as mere delimiters.
 
 Well, but there are cmake files out there (i.e. all existing cmake-based 
 projects) which also must behave basically exactly the same as before, 
 otherwise their builds might break.
 
 Not sure how to achieve this.
 A policy ?
 

Actually it is very easy to make it transparent and thus not need to modify any 
.cmake files.

We just need to escape the curly brackets:
   {   -   \{
   }   -   \}
in the regular expression before compiling it.

This way we'll have full compatibility with previous regexp syntax.

sincerely,
Alex CIobanu

--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-17 Thread Brad King

On 11/17/2011 3:19 PM, Alexandru Ciobanu wrote:
 I was able to make CMake use TRE, by changing the
 RegularExpression.{cxx,hxx.in} files.

Those are down in Source/kwsys which is a directory shared by
projects other than just CMake.  We cannot touch the files there.
Instead you will need to re-factor things to go through a wrapper.
The first stage will just wrap up the KWSys regular expression API.
The second stage will replace the implementation with TRE.


- Does anyone see a problem if we add TRE in CMake the same

 way as ZLIB, CURL, etc? (i.e. in ./Utilities/)

That should be fine.

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-17 Thread Brad King

On 11/17/2011 4:28 PM, Sean McBride wrote:

Has using the POSIX regex.h APIs been ruled out?


Windows?

-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-16 Thread Alexandru Ciobanu
Hi,

I was successful in making CMake work with PCRE. As expected, it was 
straightforward.

The problem is that PCRE is also slow. So, I tested the same string and regex 
with multiple different libraries in order to assess performance. 

The regular expression in question is:
  ([^:]+): warning[ \t]*[0-9]+[ \t]*:

The string is a 6k character string, a typical compiler command line. (See my 
first message for sample code).

For each library the steps are:
   - regcomp() the regular expression 
   - regexec() the expression on the string 

Here is how much time it takes to process the string *one* time:
current CMake   -- 860ms
TRex  --  680ms
PCRE  -- 610ms  ( with pcre_exec() )
PCRE  -- 990ms  ( with pcre_dfa_exec() )
re2  --  0.085ms
/usr/include/regex.h  -- 0.075ms

As it can be seen re2 and the standard regex.h are orders of magnitude faster 
in executing this particular regular expression. 

The difference between PCRE and re2 is also confirmed by this study:
http://swtch.com/~rsc/regexp/regexp3.html

CONCLUSTION:
   - PCRE is not fast enough

QUESTION:
   - is there a reason we shouldn't use the standard regex.h?

sincerely,
Alex Ciobanu



On 2011-11-15, at 10:30 AM, Pau Garcia i Quiles wrote:

 Hi,
 
 If it's of any help, I used the pcrecpp library by Google (it's part
 of PCRE). With pcrecpp, most operations were only 1-3 lines long. The
 only problem I found is PCRE provided no way to get the previous/next
 match, which CMake needs.
 
 
 
 On Tue, Nov 15, 2011 at 4:25 PM, Alexandru Ciobanu
 a...@rogue-research.com wrote:
 Hi Bill and Pau,
 
 I am currently working on adding PCRE to CMake. Chances are very hight that 
 it will work, given the very similar comp()/exec() API calls in both 
 implementations.
 
 I'll let you know about the results soon.
 
 Alex
 
 
 On 2011-11-14, at 10:31 PM, Bill Hoffman wrote:
 
 On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote:
 Bill,
 
 I think the current incarnation of regexps in CMake should be kept for
 compatibility reasons.
 
 Yes, of course.
 
 Adding PCRE is not difficult, just time consuming. The implementation
 I'd do would be an additional abstraction layer:
 - For the current BRE implementation, it would be a 1:1 call match
 - For the PCRE implementation, it would keep match status, count,
 next/previous iterators, etc.
 
 So, for this case I would be interested to here from Alex to see if 
 swapping out the regex will fix the ctest performance issue.  It is a nice 
 isolated place to give PCRE a try.
 
 -Bill
 --
 
 Powered by www.kitware.com
 
 Visit other Kitware open-source projects at 
 http://www.kitware.com/opensource/opensource.html
 
 Please keep messages on-topic and check the CMake FAQ at: 
 http://www.cmake.org/Wiki/CMake_FAQ
 
 Follow this link to subscribe/unsubscribe:
 http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
 
 
 
 
 
 -- 
 Pau Garcia i Quiles
 http://www.elpauer.org
 (Due to my workload, I may need 10 days to answer)

--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-16 Thread Alexandru Ciobanu
Hi Brad,

[1]

 On 11/16/2011 12:44 PM, Alexandru Ciobanu wrote:
 For each library the steps are:
 - regcomp() the regular expression
 - regexec() the expression on the string
 
 Can you time each of these steps separately for each library?  I would not
 be surprised if the compilation time is the bottleneck.  The evaluation and
 matching of a given string just followed a DFA which should be pretty fast.
 If it turns out that compilation is the bottleneck then we should refactor
 things to make sure CTest compiles each regex at most once so we can re-use
 the same DFA every time.


This is how I run the tests (pseudocode):
   recomp()
   repeat 1000 times:
   regexec()

The times I reported are the total run times divided by 1000.

For the slow ones (TRex,  PCRE, CMake regexp) I have to repeat 10 times only 
otherwise I wait too long. So it seems that regcomp() is not the problem in 
this case.

[2]
I have just tested another library - TRE. 

It performs well, I will put it in context:
current CMake   -- 860ms
TRex  --  680ms
PCRE  -- 610ms  ( with pcre_exec() )
PCRE  -- 990ms  ( with pcre_dfa_exec() )
re2  --  0.085ms
/usr/include/regex.h  -- 0.075ms 
TRE  --  0.3ms   ( 
 NEW )

Advantages of TRE:
  - API very similar to standard regex.h (i.e. easy to integrate with CMake)
  - supports wide characters
  - compiles on many platforms Windows, AIX, HP-UX, you name it.

What do you think about TRE?

sincerely,
Alex Ciobanu



tre.test.c
Description: Binary data


--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-16 Thread Alexander Neundorf
On Wednesday 16 November 2011, Alexandru Ciobanu wrote:
 Hi Brad,
...
 Advantages of TRE:
   - API very similar to standard regex.h (i.e. easy to integrate with
 CMake) - supports wide characters
   - compiles on many platforms Windows, AIX, HP-UX, you name it.
 
 What do you think about TRE?

http://laurikari.net/tre/about/
BSD licensed, gcc, IBM, HP; Sun compilers supported, also MSVC, including 
version 6.

So from that side it looks good.

Docs for the supported syntax:
http://laurikari.net/tre/documentation/regex-syntax/

Alex
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-16 Thread Brad King

On 11/16/2011 2:12 PM, Alexandru Ciobanu wrote:

This is how I run the tests (pseudocode):
recomp()
repeat 1000 times:
regexec()


Thanks for the explanation.


 TRex  --  680ms
 PCRE  -- 610ms  ( with pcre_exec() )
 PCRE  -- 990ms  ( with pcre_dfa_exec() )
 re2  --  0.085ms
 /usr/include/regex.h  -- 0.075ms
 TRE  --  0.3ms


The performance variation is interesting.  It is probably worthwhile
to use a profiling tool (such as valgrind --tool=callgrind and kcachegrind)
to see where PCRE is spending its time.


Advantages of TRE:
   - API very similar to standard regex.h (i.e. easy to integrate with CMake)
   - supports wide characters
   - compiles on many platforms Windows, AIX, HP-UX, you name it.

What do you think about TRE?


It looks like a promising candidate.

Thanks,
-Brad
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-16 Thread Bill Hoffman
A little off topic, but I am wondering if the ctest performance issue 
for xcode could be fixed without changing the regex.  The problem with 
xcode is that it spits out very verbose output.  I am wondering if some 
short circuit stuff could be put in place.  Maybe do a string compare of 
the first bit of every line that look for stuff that could not have an 
error in it, and only if it might have an error, do we pass it to the 
regex call.  Basically, if we could reduce the amount of data going into 
the regex stuff, it should work as well as it does for other compilers. 
 The place that this code could go is into cmakexbuild.cxx which 
already strips out all lines that start with setenv.  Maybe even some 
hard coded stuff that looks for errors and only puts those out.  i.e. 
only output lines from cmakexbuild.cxx if there are errors.


-Bill




--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-16 Thread Bill Hoffman

On 11/16/2011 4:11 PM, Sean McBride wrote:



The downside is that this solution would be fragile. xcodebuild's

output is not guaranteed to be the same forever, it's not like a public
API. Already today, gcc and clang output pretty differently. I'm a
little worried this would bite us.


Besides, improving regex performance would be a win everywhere, not

just in this case.



Still, we are already filtering the output some, and it is way too 
verbose which is why it is the only place where this is a problem.  It 
might be worth exploring as a faster path to getting things working for you.


-Bill
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-15 Thread Pau Garcia i Quiles
Hi,

If it's of any help, I used the pcrecpp library by Google (it's part
of PCRE). With pcrecpp, most operations were only 1-3 lines long. The
only problem I found is PCRE provided no way to get the previous/next
match, which CMake needs.



On Tue, Nov 15, 2011 at 4:25 PM, Alexandru Ciobanu
a...@rogue-research.com wrote:
 Hi Bill and Pau,

 I am currently working on adding PCRE to CMake. Chances are very hight that 
 it will work, given the very similar comp()/exec() API calls in both 
 implementations.

 I'll let you know about the results soon.

 Alex


 On 2011-11-14, at 10:31 PM, Bill Hoffman wrote:

 On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote:
 Bill,

 I think the current incarnation of regexps in CMake should be kept for
 compatibility reasons.

 Yes, of course.

 Adding PCRE is not difficult, just time consuming. The implementation
 I'd do would be an additional abstraction layer:
 - For the current BRE implementation, it would be a 1:1 call match
 - For the PCRE implementation, it would keep match status, count,
 next/previous iterators, etc.

 So, for this case I would be interested to here from Alex to see if swapping 
 out the regex will fix the ctest performance issue.  It is a nice isolated 
 place to give PCRE a try.

 -Bill
 --

 Powered by www.kitware.com

 Visit other Kitware open-source projects at 
 http://www.kitware.com/opensource/opensource.html

 Please keep messages on-topic and check the CMake FAQ at: 
 http://www.cmake.org/Wiki/CMake_FAQ

 Follow this link to subscribe/unsubscribe:
 http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers





-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers


Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-14 Thread Pau Garcia i Quiles
Bill,

I think the current incarnation of regexps in CMake should be kept for
compatibility reasons.

Adding PCRE is not difficult, just time consuming. The implementation I'd
do would be an additional abstraction layer:
- For the current BRE implementation, it would be a 1:1 call match
- For the PCRE implementation, it would keep match status, count,
next/previous iterators, etc.


On Mon, Nov 14, 2011 at 7:30 PM, Bill Hoffman bill.hoff...@kitware.comwrote:

 Sorry for the top post...  However, if the issue with ctest being slow can
 be fixed by using PCRE in CMake, that is good news.  We can just link in
 the library, and replace that small part of CMake internal code that has
 the performance problem.  This should not break backwards compatibility.
  It also gives us a way to slowly bring in PCRE into CMake.

 Alex, is there a way you can try PCRE in CMake to see if it fixes the
 problem?

 -Bill



 On 11/14/2011 1:13 PM, Pau Garcia i Quiles wrote:

 Hi,

 Check this:

 A wish a day 11: Perl Compatible Regular Expressions in CMake
 http://www.elpauer.org/?p=684

 Unfortunately the student turned out to be a total fraud: he knew
 nothing about CMake, regular expressions (much less PCRE!), git, and
 could barely manage with C/C++. After months of explaining *really*
 basic stuff (such as the difference between a static and a shared
 library), he silently gave up.

 I do have an initial implementation and extensive information on how to
 implement PCRE in CMake. It's just I don't have enough spare time to do
 that, and at work I cannot justify investing so many time in CMake for
 free (for now, we don't need advanced regular expressions)


 On Mon, Nov 14, 2011 at 6:57 PM, Alexandru Ciobanu
 a...@rogue-research.com 
 mailto:alex@rogue-research.**coma...@rogue-research.com
 wrote:

Hi,

Our team is affected by issue 0012381, that causes extremely poor
performance by CTest. Details here:

 http://public.kitware.com/Bug/**view.php?id=12381http://public.kitware.com/Bug/view.php?id=12381

I've created a small test case that demonstrates the problem. Please
find the .cpp file attached.

 From what I see, the RegularExpression class uses Henry Spencer
regex implementation, which is known to be slow for some cases.

On my machine, the attached example runs in 0.8 sec. Just to process
one string!
   $ time ./repr
   real 0m0.865s
   user 0m0.862s
   sys  0m0.002s

Grep can process 100k such strings in 0.5 sec (which includes
reading a 570MB file from disk):
   $ wc -l big.str.txt
  10 big.str.txt
   $ ls -lh big.str.txt
   -rw-r--r--  1 alex  staff   572M 14 Nov 12:30 big.str.txt
   $ time grep ([^:]+): warning[ \t]*[0-9]+[ \t]*: big.str.txt
   real 0m0.525s
   user 0m0.255s
   sys  0m0.269s

I see three ways to fix this problem:
  A) use a trusted 3rd party regex library, like re2 or pcre
  B) find another self-contained regex implementation
  C) try to use the standard POSIX regex available in regex.h on
most systems

I tried to find another self-contained regex implementation, that we
could use. I found Tiny REX, but it is as slow, in this case, as
Henry Spencer's implementation.

So what do you think is the best way to proceed about this problem?

sincerely,
Alex Ciobanu



-- 
Pau Garcia i Quiles
http://www.elpauer.org
(Due to my workload, I may need 10 days to answer)
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers

Re: [cmake-developers] slow regex implementation in RegularExpression

2011-11-14 Thread Bill Hoffman

On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote:

Bill,

I think the current incarnation of regexps in CMake should be kept for
compatibility reasons.


Yes, of course.


Adding PCRE is not difficult, just time consuming. The implementation
I'd do would be an additional abstraction layer:
- For the current BRE implementation, it would be a 1:1 call match
- For the PCRE implementation, it would keep match status, count,
next/previous iterators, etc.

So, for this case I would be interested to here from Alex to see if 
swapping out the regex will fix the ctest performance issue.  It is a 
nice isolated place to give PCRE a try.


-Bill
--

Powered by www.kitware.com

Visit other Kitware open-source projects at 
http://www.kitware.com/opensource/opensource.html

Please keep messages on-topic and check the CMake FAQ at: 
http://www.cmake.org/Wiki/CMake_FAQ

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers