Re: [cmake-developers] slow regex implementation in RegularExpression
Hi, We found a workaround that does not require any source code modifications. I added the description to the bug report: http://public.kitware.com/Bug/view.php?id=12381#c27872 In short, we filter out the longer lines in the build output, so that CTest does not spend hours regex-matching them. In a way, this is like the short circuit stuff that Bill Hoffman suggested earlier in this thread. On 2011-11-16, at 3:50 PM, Bill Hoffman wrote: A little off topic, but I am wondering if the ctest performance issue for xcode could be fixed without changing the regex. The problem with xcode is that it spits out very verbose output. I am wondering if some short circuit stuff could be put in place. Maybe do a string compare of the first bit of every line that look for stuff that could not have an error in it, and only if it might have an error, do we pass it to the regex call. Basically, if we could reduce the amount of data going into the regex stuff, it should work as well as it does for other compilers. The place that this code could go is into cmakexbuild.cxx which already strips out all lines that start with setenv. Maybe even some hard coded stuff that looks for errors and only puts those out. i.e. only output lines from cmakexbuild.cxx if there are errors. Since it is similar to the way that cmakexbuild.cxx filters out setenv lines, it would be possible to add this additional filter. The only drawback would be that the compiler invocation lines will not be in the logs anymore (since these are the longest lines that we eliminate). If wanted, I can submit the small patch with the modifications to cmakexbuild.cxx. sincerely, Alex Ciobanu -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/29/2011 2:41 PM, Alexandru Ciobanu wrote: Hi, We found a workaround that does not require any source code modifications. I added the description to the bug report: http://public.kitware.com/Bug/view.php?id=12381#c27872 Good, glad you are up and running again. ... Since it is similar to the way that cmakexbuild.cxx filters out setenv lines, it would be possible to add this additional filter. The only drawback would be that the compiler invocation lines will not be in the logs anymore (since these are the longest lines that we eliminate). If wanted, I can submit the small patch with the modifications to cmakexbuild.cxx. I am wondering if we might want to make this an option to cmakexbuild? cmakexbuild -skip-lines=/Developer/usr/bin which could be something you would put in your build command: SET (CTEST_BUILD_COMMAND cmakexbuild -skip-lines=/Developer/usr/bin) Then, if you are seeing significant performance issues, you could add this. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Tue, Nov 29, 2011 at 1:05 PM, Bill Hoffman bill.hoff...@kitware.comwrote: On 11/29/2011 2:41 PM, Alexandru Ciobanu wrote: Hi, We found a workaround that does not require any source code modifications. I added the description to the bug report: http://public.kitware.com/Bug/**view.php?id=12381#c27872http://public.kitware.com/Bug/view.php?id=12381#c27872 Good, glad you are up and running again. ... Does this mean we won't see any new and better regexp code in CMake? I for one was looking forward to a better regexp engine. James -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Tue, Nov 29, 2011 at 3:44 PM, James Bigler jamesbig...@gmail.com wrote: On Tue, Nov 29, 2011 at 1:05 PM, Bill Hoffman bill.hoff...@kitware.com wrote: On 11/29/2011 2:41 PM, Alexandru Ciobanu wrote: Hi, We found a workaround that does not require any source code modifications. I added the description to the bug report: http://public.kitware.com/Bug/view.php?id=12381#c27872 Good, glad you are up and running again. ... Does this mean we won't see any new and better regexp code in CMake? I for one was looking forward to a better regexp engine. James -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers I think we're going to take it slow and make it well-thought-out before we put a new regex engine in... It will not be appearing in the upcoming 2.8.7 since nothing is even in 'next' yet, and we are still discussing the options. We'll keep sleeping on it till it seems cooked enough before we put something in 'next' to try out. Thx, David -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wednesday 23 November 2011, David Cole wrote: On Wed, Nov 23, 2011 at 2:09 PM, David Cole david.c...@kitware.com wrote: On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com wrote: On 11/23/2011 12:51 PM, Brad King wrote: On 11/23/2011 12:48 PM, Brad King wrote: On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ To include a literal ] in the list, make it either the first item It must be the [: in this regex that TRE sees as special since it allows expressions like [:digit:] inside a bracket expression. Still, this is a case that my proposed policy would pick up. -Brad I am still very wary about this policy. For 99% of folks the current regex is just fine. Making them eventually change to get the new regex is making them do work that they don't need or want. I would rather have two API's. I just don't see the big upside of TRE, and I see this causing pain for lots and lots of folks if we push them to make the change. CMake has most likely 100,000 or more users at this point. A change like this could easily inflict a man years of effort onto the world, and should not be taken lightly. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers Big upside:(quoting from Alexandru Ciobanu's email of Nov. 17th earlier in this thread) The impact on the build time is pretty dramatic: CMake: 7h39m CMake + TRE: 1h06m And although there is a big upside, we do still have to be careful. We have to remember that regexes are used in the context of ctest -D invocations, ctest -S script running and cmake -P running, too, where policies are not really a reliable mechanism. So in addition to having a careful policy, we also have to decide what to do in those cases. The case that is in question here for the big performance gain is ctest running and filtering build output based on regexes. No cmake policy mechanism in sight for that scenario. Also, AFAIK supports more stuff than the current regexps. This is good. But doesn't that mean that potentially there could be regexps existing which right now don't have a special meaning, but with TRE they suddenly get a special meaning and change the matching ? Does e.g. the following produce the same result ? I get : as match here. set(text hello world [:digit:]) if (${text} MATCHES .+([:digit:])) message(STATUS match: \${CMAKE_MATCH_1}\) endif() Alex -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/22/2011 4:39 PM, Brad King wrote: It is tempting to always require explicit requests for new TRE behavior, such as using TRE instead of REGEX in keyword locations, but one advantage of using a policy is that over time the old behavior will disappear completely from usage. I am pretty sure the last time we talked about adding a new regex we talked about requiring explicit requests. I think this would be a much safer approach. I am really scared that this regex will not be compatible with the old one, and it will break lots of stuff in very subtle ways that are hard for people to detect. It is not that much code to have both. Where performance is an issue, we can swap it out, and when people need better regex they can use TRE as well. I don't think the pain will be worth getting rid of the old usage. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wed, Nov 23, 2011 at 8:36 AM, Bill Hoffman bill.hoff...@kitware.comwrote: On 11/22/2011 4:39 PM, Brad King wrote: It is tempting to always require explicit requests for new TRE behavior, such as using TRE instead of REGEX in keyword locations, but one advantage of using a policy is that over time the old behavior will disappear completely from usage. I am pretty sure the last time we talked about adding a new regex we talked about requiring explicit requests. I think this would be a much safer approach. I am really scared that this regex will not be compatible with the old one, and it will break lots of stuff in very subtle ways that are hard for people to detect. It is not that much code to have both. Where performance is an issue, we can swap it out, and when people need better regex they can use TRE as well. I don't think the pain will be worth getting rid of the old usage. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/** opensource/opensource.htmlhttp://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/**CMake_FAQhttp://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-**bin/mailman/listinfo/cmake-**developershttp://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers Why can't this be solved with a policy? One problem of using an explicit TRE command is that if you want to write code that *could* be used in an older version of CMake you won't be able to use it. I agree that making the usage explicit would allow for complete backward compatibility, but it clutters the language. Do you want to have two versions of every regular expression syntax? GLOB vs GLOB_TRE? MATCHES vs MATCHES_TRE? Another argument against using TRE explicitly is these words tell me nothing about what that function does unless I'm extremely familiar with the intricacies of CMake script. I think we want to make CMake easier to use, not harder. James -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 11:43 AM, James Bigler wrote: Why can't this be solved with a policy? One problem of using an explicit TRE command is that if you want to write code that *could* be used in an older version of CMake you won't be able to use it. It could be, but that will not come without pain. I can see a future were this policy kept off in many projects. I agree that making the usage explicit would allow for complete backward compatibility, but it clutters the language. Do you want to have two versions of evegry regular expression syntax? GLOB vs GLOB_TRE? MATCHES vs MATCHES_TRE? Another argument against using TRE explicitly is these words tell me nothing about what that function does unless I'm extremely familiar with the intricacies of CMake script. I think we want to make CMake easier to use, not harder. For many folks the current regex is just fine, so they will not have to do anything. Only the people that have hit some sort of wall with the regex will need to change. If we have a policy, I can see projects changing the minimum required version and having a subtle bug show up because the regex changed. I don' think we even know what all the differences are at this point... -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 12:06 PM, Bill Hoffman wrote: On 11/23/2011 11:43 AM, James Bigler wrote: Why can't this be solved with a policy? One problem of using an explicit TRE command is that if you want to write code that *could* be used in an older version of CMake you won't be able to use it. It could be, but that will not come without pain. I can see a future were this policy kept off in many projects. We can wrap both the TRE and KWSys regex implementations in a cmRegularExpression that provides the same interface for both. Then we can implement the policy at that level. If the policy is set to OLD, use the KWSys implementation. If the policy is set to NEW, use the TRE implementation. If the policy is not set, then do *both* implementations and compare the results of matching. If they are different warn about the policy not being set and favor the KWSys result for compatibility. I don't think we even know what all the differences are at this point... The above approach would reveal differences without breaking projects. Alex C. should produce some unit tests for the existing behavior which we should have had all along. Even though we probably cannot think of everything every project is doing at least we will know up front if something major is different. Policies are meant for use when the new behavior is clearly better than the old behavior in every way and should have been the way it was always done had it been available from the beginning. This is such a case. We are not switching between two desirable behaviors. The only argument in favor of the old behavior is that is how it works now. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
Hi Bill, On 2011-11-23, at 10:36 AM, Bill Hoffman wrote: I am pretty sure the last time we talked about adding a new regex we talked about requiring explicit requests. I think this would be a much safer approach. I am really scared that this regex will not be compatible with the old one, and it will break lots of stuff in very subtle ways that are hard for people to detect. It is not that much code to have both. Where performance is an issue, we can swap it out, and when people need better regex they can use TRE as well. I don't think the pain will be worth getting rid of the old usage. I agree with you on multiple points here. [1] Your intuition is right. Just minutes before reading your email I tried to compile ITK using CMake+TRE. And there was at least one regex that TRE refused to compile. So yes, we cannot use TRE for *all* regex needs in CMake. [2] It is not that much code to have both. I agree, and I think Brad King does too. The patch I submitted does exactly that: - old regex code stays unchanged --- Source/kwsys/RegularExpression.{cxx,hxx.in} - new TRE regex code added --- Utilities/cmtre So, yes, we intend to keep both. [3] I think there's a simple solution to our problem. I think we can: - solve the performance problem - keep the old behaviour, i.e. not break any projects out there The solution is to use the old regex code everywhere, except for the very specific place where it causes problems. By looking at the bug report for 12381 (http://public.kitware.com/Bug/view.php?id=12381), you can see that the only place we have to use TRE is: Source/CTest/cmCTestBuildHandler.cxx What cmCTestBuildHandler does is very simple: for every single line of output from the build process it tries to match about 100 regular expressions, in order to find error or warning messages. These 100 regular expressions are defined in the same file in four static arrays, that look something like this: static const char* cmCTestErrorMatches[] = { ^[Bb]us [Ee]rror, ^[Ss]egmentation [Vv]iolation, ^[Ss]egmentation [Ff]ault, :.*[Pp]ermission [Dd]enied, ([^ :]+):([0-9]+): ([^ \\t]), ([^:]+): error[ \\t]*[0-9]+[ \\t]*:, // AND SO ON … }; // + other 3 arrays In this case, it is safe to use TRE, because we (the CMake developers) write these regular expressions, and we can make sure they work with TRE. All other regular expressions, including those written by users in their CMakeList.txt files will run on the old regex code, and thus behave normally. CONCLUSION: - we can keep old behaviour and solve the performance problem - solution in part [3] If this solution is acceptable, I'll have to recreate the patch. sincerely, Alex Ciobanu-- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 12:20 PM, Alexandru Ciobanu wrote: to compile ITK using CMake+TRE. And there was at least one regex that TRE refused to compile. What was it, and where in the ITK code is it? Thanks, -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 2011-11-23, at 12:24 PM, Brad King wrote: On 11/23/2011 12:20 PM, Alexandru Ciobanu wrote: to compile ITK using CMake+TRE. And there was at least one regex that TRE refused to compile. What was it, and where in the ITK code is it? The regex in question is: ^[^][:/*?]+\$ And it appears at this location in the ITK source tree: CMake/ExternalData.cmake:347 And the expression is correct, because you're allowed to have the ] metacharacter inside a [^xyz] class if it comes immediately after ^. TRE does not do it the same way, see (http://laurikari.net/tre/documentation/regex-syntax/ the Bracket expressions section): To include a literal ] in the list, make it either the first item, the second endpoint of a range, or enclose it in [. and.]. sincerely, Alex-- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ And it appears at this location in the ITK source tree: CMake/ExternalData.cmake:347 And the expression is correct, because you're allowed to have the ] metacharacter inside a [^xyz] class if it comes immediately after ^. Ironically I was the one that wrote that regex ;) TRE does not do it the same way, see (http://laurikari.net/tre/documentation/regex-syntax/ the Bracket expressions section): Interesting, thanks. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ And it appears at this location in the ITK source tree: CMake/ExternalData.cmake:347 And the expression is correct, because you're allowed to have the ] metacharacter inside a [^xyz] class if it comes immediately after ^. Ironically I was the one that wrote that regex ;) TRE does not do it the same way, see (http://laurikari.net/tre/documentation/regex-syntax/ the Bracket expressions section): Wait, that documentation does say the same thing: bracket-expression ::= [ item+ ] | [^ item+ ] To include a literal ] in the list, make it either the first item That's exactly what this regex does. It uses the second production rule in the above grammar fragment and puts the ']' first after '^'. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 12:48 PM, Brad King wrote: On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ To include a literal ] in the list, make it either the first item It must be the [: in this regex that TRE sees as special since it allows expressions like [:digit:] inside a bracket expression. Still, this is a case that my proposed policy would pick up. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 12:51 PM, Brad King wrote: On 11/23/2011 12:48 PM, Brad King wrote: On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ To include a literal ] in the list, make it either the first item It must be the [: in this regex that TRE sees as special since it allows expressions like [:digit:] inside a bracket expression. Still, this is a case that my proposed policy would pick up. -Brad I am still very wary about this policy. For 99% of folks the current regex is just fine. Making them eventually change to get the new regex is making them do work that they don't need or want. I would rather have two API's. I just don't see the big upside of TRE, and I see this causing pain for lots and lots of folks if we push them to make the change. CMake has most likely 100,000 or more users at this point. A change like this could easily inflict a man years of effort onto the world, and should not be taken lightly. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com wrote: On 11/23/2011 12:51 PM, Brad King wrote: On 11/23/2011 12:48 PM, Brad King wrote: On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ To include a literal ] in the list, make it either the first item It must be the [: in this regex that TRE sees as special since it allows expressions like [:digit:] inside a bracket expression. Still, this is a case that my proposed policy would pick up. -Brad I am still very wary about this policy. For 99% of folks the current regex is just fine. Making them eventually change to get the new regex is making them do work that they don't need or want. I would rather have two API's. I just don't see the big upside of TRE, and I see this causing pain for lots and lots of folks if we push them to make the change. CMake has most likely 100,000 or more users at this point. A change like this could easily inflict a man years of effort onto the world, and should not be taken lightly. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers Big upside:(quoting from Alexandru Ciobanu's email of Nov. 17th earlier in this thread) The impact on the build time is pretty dramatic: CMake: 7h39m CMake + TRE: 1h06m -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com wrote: On 11/23/2011 12:51 PM, Brad King wrote: On 11/23/2011 12:48 PM, Brad King wrote: On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ To include a literal ] in the list, make it either the first item It must be the [: in this regex that TRE sees as special since it allows expressions like [:digit:] inside a bracket expression. Still, this is a case that my proposed policy would pick up. -Brad I am still very wary about this policy. For 99% of folks the current regex is just fine. Making them eventually change to get the new regex is making them do work that they don't need or want. I would rather have two API's. I just don't see the big upside of TRE, and I see this causing pain for lots and lots of folks if we push them to make the change. CMake has most likely 100,000 or more users at this point. A change like this could easily inflict a man years of effort onto the world, and should not be taken lightly. Couldn't they defer by setting the policy to OLD? If they bump the minimum version the user knows that backward incompatible changes may be introduced. If there was a way to verify that the output of the regex were the same with both implementations too, that should reduce the possibility of subtle bugs. Marcus -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wed, Nov 23, 2011 at 2:09 PM, David Cole david.c...@kitware.com wrote: On Wed, Nov 23, 2011 at 2:03 PM, Bill Hoffman bill.hoff...@kitware.com wrote: On 11/23/2011 12:51 PM, Brad King wrote: On 11/23/2011 12:48 PM, Brad King wrote: On 11/23/2011 12:43 PM, Brad King wrote: On 11/23/2011 12:34 PM, Alexandru Ciobanu wrote: The regex in question is: ^[^][:/*?]+\$ To include a literal ] in the list, make it either the first item It must be the [: in this regex that TRE sees as special since it allows expressions like [:digit:] inside a bracket expression. Still, this is a case that my proposed policy would pick up. -Brad I am still very wary about this policy. For 99% of folks the current regex is just fine. Making them eventually change to get the new regex is making them do work that they don't need or want. I would rather have two API's. I just don't see the big upside of TRE, and I see this causing pain for lots and lots of folks if we push them to make the change. CMake has most likely 100,000 or more users at this point. A change like this could easily inflict a man years of effort onto the world, and should not be taken lightly. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers Big upside: (quoting from Alexandru Ciobanu's email of Nov. 17th earlier in this thread) The impact on the build time is pretty dramatic: CMake: 7h39m CMake + TRE: 1h06m And although there is a big upside, we do still have to be careful. We have to remember that regexes are used in the context of ctest -D invocations, ctest -S script running and cmake -P running, too, where policies are not really a reliable mechanism. So in addition to having a careful policy, we also have to decide what to do in those cases. The case that is in question here for the big performance gain is ctest running and filtering build output based on regexes. No cmake policy mechanism in sight for that scenario. -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wed, Nov 23, 2011 at 3:24 PM, Sean McBride s...@rogue-research.com wrote: On Wed, 23 Nov 2011 14:03:20 -0500, Bill Hoffman said: For 99% of folks the current regex is just fine. AFAICT, this performance bug affects 100% of Xcode generator users. Even looking at CMake's dashboard, you can see the difference, just search it for 'xcode'. ex: 100% of Xcode users that use ctest to build. I would still put that in the 1% of those that build with CMake. A build with CMake does not go via ctest, but is built on the command line or in the IDE Xcode which will not have the regex slow down. Changing just ctest somehow is a much smaller scope than changing every regex in all of CMake. I stand by my 99% are OK with the regex that they have. :) -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/23/2011 5:43 PM, Brad King wrote: On 11/23/2011 12:44 PM, Brad King wrote: However, the above does not need to stand in the way of solving the problem you're addressing. We can simply set that goal aside for now by not exposing TRE in the CMake language anywhere. Use it just for cmCTestBuildHandler. but people kept going on the above part of the debate ;) After some more thought, I've realized that no approach currently proposed is practical: - cmCTestBuildHandler can use a list of custom regular expressions so we cannot assume all of them will be compatible with TRE - As David Cole pointed out there are many places, like CTest's -R and -E options, that use regular expressions in contexts where we cannot possibly use a policy. Any attempt to do so in such places would just turn into a second API to set the policy in the local context of the regex. - If we add a second API like MATCHES = MATCHES_TRE then we would eventually need to do that in *every* place that offers regex matching. That would mean alternatives to the above -R and -E options and a lot more. - People could write code that passes a regex around in a variable. This would hide from the author of the regex the context in which it will be used, so it is unknown whether it is TRE or traditional. I propose we go back to an approach discussed the first time PCRE was proposed. The indication of the type of regex must be in the regex itself. IIRC the proposal was something like REGEX:...# old PCRE:... # PCRE Of course that is ambiguous too because the prefixes are valid expressions. Instead we can use a prefix that is not otherwise a valid expression. We can use an idea from Python: http://docs.python.org/library/re.html that defines expressions of the form (?...) which are not otherwise valid. In order to avoid conflict with future use of the constructs they define, we can use the comment form Python defines: (?#OLD)... # old (?#TRE)... # TRE This is quite easy to implement. Just take the currently proposed patch that replaces use of cmsys::RegularExpression with the new cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression). Inside the wrapper look for a leading comment of the above form to decide which regex impl to use internally. Then strip off the prefix and pass the rest of the regex to the underlying implementation. Once this is done update all the default warning and error regular expressions that CTest uses. Add the (?#TRE) prefix to them. This approach will solve the speed problem, give people access to the TRE extended features when they want it anywhere CMake already uses a regex, has no compatibility problems, is a very narrow second interface, and is extensible for future optional regex behavior. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/24/2011 12:34 AM, Brad King wrote: On 11/23/2011 5:43 PM, Brad King wrote: On 11/23/2011 12:44 PM, Brad King wrote: However, the above does not need to stand in the way of solving the problem you're addressing. We can simply set that goal aside for now by not exposing TRE in the CMake language anywhere. Use it just for cmCTestBuildHandler. but people kept going on the above part of the debate ;) After some more thought, I've realized that no approach currently proposed is practical: - cmCTestBuildHandler can use a list of custom regular expressions so we cannot assume all of them will be compatible with TRE - As David Cole pointed out there are many places, like CTest's -R and -E options, that use regular expressions in contexts where we cannot possibly use a policy. Any attempt to do so in such places would just turn into a second API to set the policy in the local context of the regex. - If we add a second API like MATCHES = MATCHES_TRE then we would eventually need to do that in *every* place that offers regex matching. That would mean alternatives to the above -R and -E options and a lot more. - People could write code that passes a regex around in a variable. This would hide from the author of the regex the context in which it will be used, so it is unknown whether it is TRE or traditional. I propose we go back to an approach discussed the first time PCRE was proposed. The indication of the type of regex must be in the regex itself. IIRC the proposal was something like REGEX:...# old PCRE:... # PCRE Of course that is ambiguous too because the prefixes are valid expressions. Instead we can use a prefix that is not otherwise a valid expression. We can use an idea from Python: http://docs.python.org/library/re.html that defines expressions of the form (?...) which are not otherwise valid. In order to avoid conflict with future use of the constructs they define, we can use the comment form Python defines: (?#OLD)... # old (?#TRE)... # TRE This is quite easy to implement. Just take the currently proposed patch that replaces use of cmsys::RegularExpression with the new cmFastRegularExpression wrapper (perhaps renamed cmRegularExpression). Inside the wrapper look for a leading comment of the above form to decide which regex impl to use internally. Then strip off the prefix and pass the rest of the regex to the underlying implementation. Once this is done update all the default warning and error regular expressions that CTest uses. Add the (?#TRE) prefix to them. This approach will solve the speed problem, give people access to the TRE extended features when they want it anywhere CMake already uses a regex, has no compatibility problems, is a very narrow second interface, and is extensible for future optional regex behavior. -Brad I like that proposal a lot, although I'm afraid it is a bit verbose. Some of my regexes are already pretty lengthy, pushing the 80-columns limit. Michael -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 2011-11-17, at 3:59 PM, Brad King wrote: On 11/17/2011 3:19 PM, Alexandru Ciobanu wrote: I was able to make CMake use TRE, by changing the RegularExpression.{cxx,hxx.in} files. Those are down in Source/kwsys which is a directory shared by projects other than just CMake. We cannot touch the files there. Instead you will need to re-factor things to go through a wrapper. The first stage will just wrap up the KWSys regular expression API. The second stage will replace the implementation with TRE. - Does anyone see a problem if we add TRE in CMake the same way as ZLIB, CURL, etc? (i.e. in ./Utilities/) That should be fine. -Brad Hi, As Brad King suggested, instead of changing the files in Source/kwsys/, I created a wrapper class and made all the calls go through it. I also added the TRE library to Utilities/cmtre, and added CMAKE_USE_SYSTEM_TRE. I added the patch to the bug tracker: http://public.kitware.com/Bug/view.php?id=12381 Needless to say, the patch fixes the performance problem. To keep things simple I omitted several things: - TRE library bootstrapping (so now the Bootstrap test will fail) - the suggested policy to enable/disable approximate matching in TRE - proper checks when building TRE with CMake as done in its ./configure.ac Before I spend more time on that I would like to get some feedback, and namely: - Is the approach correct? - What next? sincerely, Alex Ciobanu-- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/22/2011 1:50 PM, Alexandru Ciobanu wrote: As Brad King suggested, instead of changing the files in Source/kwsys/, I created a wrapper class and made all the calls go through it. Thanks. I also added the TRE library to Utilities/cmtre, and added CMAKE_USE_SYSTEM_TRE. I added the patch to the bug tracker: http://public.kitware.com/Bug/view.php?id=12381 Please add a note there indicating the CMake version (git commit sha1) on which the patch was based. Otherwise I cannot apply it cleanly. Needless to say, the patch fixes the performance problem. Great! To keep things simple I omitted several things: - TRE library bootstrapping (so now the Bootstrap test will fail) The KWSys implementation will not be going away, so we can fall back to that one during bootstrapping. - the suggested policy to enable/disable approximate matching in TRE Please read up on policies to make sure you understand them: http://www.cmake.org/Wiki/CMake/Policies http://www.cmake.org/cmake/help/cmake-2-8-docs.html#command:cmake_policy http://www.cmake.org/cmake/help/cmake-2-8-docs.html#section_Policies We will need a policy to know how to treat a regex containing one of the characters that behaves differently in TRE. The OLD behavior of the policy will escape them to get the old matching behavior. The NEW behavior of the policy will use the new matching features. We also need to identify the contexts that offer regex matching but have no way to set the policy. For those we need to decide if we can simply use the new behavior outright or provide another way to switch it. It is tempting to always require explicit requests for new TRE behavior, such as using TRE instead of REGEX in keyword locations, but one advantage of using a policy is that over time the old behavior will disappear completely from usage. - proper checks when building TRE with CMake as done in its ./configure.ac IOW, porting TRE to build properly with CMake, right? - Is the approach correct? Yes. I will review the patch in more detail next week and after I know where to apply it. - What next? We need to establish a transition plan, which mostly consists of the above policy discussion. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
I also added the TRE library to Utilities/cmtre, and added CMAKE_USE_SYSTEM_TRE. I added the patch to the bug tracker: http://public.kitware.com/Bug/view.php?id=12381 Please add a note there indicating the CMake version (git commit sha1) on which the patch was based. Otherwise I cannot apply it cleanly. The commit that the patch is based on is: 5675ec5e493e01e10d9ad8d8b60eac62033f31c2 I added a note to the bug tracker. To keep things simple I omitted several things: - TRE library bootstrapping (so now the Bootstrap test will fail) The KWSys implementation will not be going away, so we can fall back to that one during bootstrapping. This is a good idea. - proper checks when building TRE with CMake as done in its ./configure.ac IOW, porting TRE to build properly with CMake, right? Yes, there are some checks, find headers, find types, etc. But all these operations have equivalents in CMake. So it should be straightforward. sincerely, Alex Ciobanu-- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
Hi everyone,[ CMake + TRE ]I was able to make CMake use TRE, by changing the RegularExpression.{cxx,hxx.in} files.I ran the CMake tests, and 100% pass. See the attached log file.(NOTE: Bootstrap, complex, complexOne were initially not aware of TRE dependency, but I fixed that easily).[ Impact of using CMake + TRE on our builds ]We picked one of our build machines and replaced the ctest binary on it.The impact on the build time is pretty dramatic: CMake:7h39m CMake + TRE: 1h06m This is a machine that has two cores.On machines that have more cores, the impact is even greater. On my 8 core machine, running a particular build task: CMake:19m57s CMake + TRE: 1m30s [ Regular expressions syntax ]In terms of regular expressions syntax, the only difference that I've seen is that TRE treats the curly brackets "{" and "}" as special characters, because it uses them for its "approximate matching". Details here: http://laurikari.net/tre/documentation/regex-syntax/The only CMake component that uses curly brackets in a regexp is: Modules/FindJNI.cmakebut it was trivial to fix because they were used as mere delimiters.As mentioned earlier, after this change 100% of the tests pass.[ Implications ]Note that CTast is *not* the only component that would benefit from faster regular expressions.I've found at least one other reported case when regular _expression_ were too slow in CMake: http://public.kitware.com/Bug/print_bug_page.php?bug_id=5537Since Glob uses RegularExpression, I would not be surprised if CMake+TRE will be faster on large code bases.CONCLUSION: - TRE is fast, benefits build times immenselyQUESTION: - Does anyone see a problem if we add TRE in CMake the same way as ZLIB, CURL, etc? (i.e. in ./Utilities/)sincerely,Alex Ciobanu time.ctest.alex.log Description: Binary data -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Thursday 17 November 2011, Alexandru Ciobanu wrote: Hi everyone, [ CMake + TRE ] I was able to make CMake use TRE, by changing the RegularExpression.{cxx,hxx.in} files. I ran the CMake tests, and 100% pass. See the attached log file. (NOTE: Bootstrap, complex, complexOne were initially not aware of TRE dependency, but I fixed that easily). Cool :-) [ Impact of using CMake + TRE on our builds ] We picked one of our build machines and replaced the ctest binary on it. The impact on the build time is pretty dramatic: CMake: 7h39m CMake + TRE: 1h06m This is a machine that has two cores. On machines that have more cores, the impact is even greater. On my 8 core machine, running a particular build task: CMake: 19m57s CMake + TRE: 1m30s [ Regular expressions syntax ] In terms of regular expressions syntax, the only difference that I've seen is that TRE treats the curly brackets { and } as special characters, because it uses them for its approximate matching. Details here: http://laurikari.net/tre/documentation/regex-syntax/ The only CMake component that uses curly brackets in a regexp is: Modules/FindJNI.cmake but it was trivial to fix because they were used as mere delimiters. Well, but there are cmake files out there (i.e. all existing cmake-based projects) which also must behave basically exactly the same as before, otherwise their builds might break. Not sure how to achieve this. A policy ? Alex -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 2011-11-17, at 3:26 PM, Alexander Neundorf wrote: [ Regular expressions syntax ] In terms of regular expressions syntax, the only difference that I've seen is that TRE treats the curly brackets { and } as special characters, because it uses them for its approximate matching. Details here: http://laurikari.net/tre/documentation/regex-syntax/ The only CMake component that uses curly brackets in a regexp is: Modules/FindJNI.cmake but it was trivial to fix because they were used as mere delimiters. Well, but there are cmake files out there (i.e. all existing cmake-based projects) which also must behave basically exactly the same as before, otherwise their builds might break. Not sure how to achieve this. A policy ? Actually it is very easy to make it transparent and thus not need to modify any .cmake files. We just need to escape the curly brackets: { - \{ } - \} in the regular expression before compiling it. This way we'll have full compatibility with previous regexp syntax. sincerely, Alex CIobanu -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/17/2011 3:19 PM, Alexandru Ciobanu wrote: I was able to make CMake use TRE, by changing the RegularExpression.{cxx,hxx.in} files. Those are down in Source/kwsys which is a directory shared by projects other than just CMake. We cannot touch the files there. Instead you will need to re-factor things to go through a wrapper. The first stage will just wrap up the KWSys regular expression API. The second stage will replace the implementation with TRE. - Does anyone see a problem if we add TRE in CMake the same way as ZLIB, CURL, etc? (i.e. in ./Utilities/) That should be fine. -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/17/2011 4:28 PM, Sean McBride wrote: Has using the POSIX regex.h APIs been ruled out? Windows? -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
Hi, I was successful in making CMake work with PCRE. As expected, it was straightforward. The problem is that PCRE is also slow. So, I tested the same string and regex with multiple different libraries in order to assess performance. The regular expression in question is: ([^:]+): warning[ \t]*[0-9]+[ \t]*: The string is a 6k character string, a typical compiler command line. (See my first message for sample code). For each library the steps are: - regcomp() the regular expression - regexec() the expression on the string Here is how much time it takes to process the string *one* time: current CMake -- 860ms TRex -- 680ms PCRE -- 610ms ( with pcre_exec() ) PCRE -- 990ms ( with pcre_dfa_exec() ) re2 -- 0.085ms /usr/include/regex.h -- 0.075ms As it can be seen re2 and the standard regex.h are orders of magnitude faster in executing this particular regular expression. The difference between PCRE and re2 is also confirmed by this study: http://swtch.com/~rsc/regexp/regexp3.html CONCLUSTION: - PCRE is not fast enough QUESTION: - is there a reason we shouldn't use the standard regex.h? sincerely, Alex Ciobanu On 2011-11-15, at 10:30 AM, Pau Garcia i Quiles wrote: Hi, If it's of any help, I used the pcrecpp library by Google (it's part of PCRE). With pcrecpp, most operations were only 1-3 lines long. The only problem I found is PCRE provided no way to get the previous/next match, which CMake needs. On Tue, Nov 15, 2011 at 4:25 PM, Alexandru Ciobanu a...@rogue-research.com wrote: Hi Bill and Pau, I am currently working on adding PCRE to CMake. Chances are very hight that it will work, given the very similar comp()/exec() API calls in both implementations. I'll let you know about the results soon. Alex On 2011-11-14, at 10:31 PM, Bill Hoffman wrote: On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote: Bill, I think the current incarnation of regexps in CMake should be kept for compatibility reasons. Yes, of course. Adding PCRE is not difficult, just time consuming. The implementation I'd do would be an additional abstraction layer: - For the current BRE implementation, it would be a 1:1 call match - For the PCRE implementation, it would keep match status, count, next/previous iterators, etc. So, for this case I would be interested to here from Alex to see if swapping out the regex will fix the ctest performance issue. It is a nice isolated place to give PCRE a try. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers -- Pau Garcia i Quiles http://www.elpauer.org (Due to my workload, I may need 10 days to answer) -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
Hi Brad, [1] On 11/16/2011 12:44 PM, Alexandru Ciobanu wrote: For each library the steps are: - regcomp() the regular expression - regexec() the expression on the string Can you time each of these steps separately for each library? I would not be surprised if the compilation time is the bottleneck. The evaluation and matching of a given string just followed a DFA which should be pretty fast. If it turns out that compilation is the bottleneck then we should refactor things to make sure CTest compiles each regex at most once so we can re-use the same DFA every time. This is how I run the tests (pseudocode): recomp() repeat 1000 times: regexec() The times I reported are the total run times divided by 1000. For the slow ones (TRex, PCRE, CMake regexp) I have to repeat 10 times only otherwise I wait too long. So it seems that regcomp() is not the problem in this case. [2] I have just tested another library - TRE. It performs well, I will put it in context: current CMake -- 860ms TRex -- 680ms PCRE -- 610ms ( with pcre_exec() ) PCRE -- 990ms ( with pcre_dfa_exec() ) re2 -- 0.085ms /usr/include/regex.h -- 0.075ms TRE -- 0.3ms ( NEW ) Advantages of TRE: - API very similar to standard regex.h (i.e. easy to integrate with CMake) - supports wide characters - compiles on many platforms Windows, AIX, HP-UX, you name it. What do you think about TRE? sincerely, Alex Ciobanu tre.test.c Description: Binary data -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On Wednesday 16 November 2011, Alexandru Ciobanu wrote: Hi Brad, ... Advantages of TRE: - API very similar to standard regex.h (i.e. easy to integrate with CMake) - supports wide characters - compiles on many platforms Windows, AIX, HP-UX, you name it. What do you think about TRE? http://laurikari.net/tre/about/ BSD licensed, gcc, IBM, HP; Sun compilers supported, also MSVC, including version 6. So from that side it looks good. Docs for the supported syntax: http://laurikari.net/tre/documentation/regex-syntax/ Alex -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/16/2011 2:12 PM, Alexandru Ciobanu wrote: This is how I run the tests (pseudocode): recomp() repeat 1000 times: regexec() Thanks for the explanation. TRex -- 680ms PCRE -- 610ms ( with pcre_exec() ) PCRE -- 990ms ( with pcre_dfa_exec() ) re2 -- 0.085ms /usr/include/regex.h -- 0.075ms TRE -- 0.3ms The performance variation is interesting. It is probably worthwhile to use a profiling tool (such as valgrind --tool=callgrind and kcachegrind) to see where PCRE is spending its time. Advantages of TRE: - API very similar to standard regex.h (i.e. easy to integrate with CMake) - supports wide characters - compiles on many platforms Windows, AIX, HP-UX, you name it. What do you think about TRE? It looks like a promising candidate. Thanks, -Brad -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
A little off topic, but I am wondering if the ctest performance issue for xcode could be fixed without changing the regex. The problem with xcode is that it spits out very verbose output. I am wondering if some short circuit stuff could be put in place. Maybe do a string compare of the first bit of every line that look for stuff that could not have an error in it, and only if it might have an error, do we pass it to the regex call. Basically, if we could reduce the amount of data going into the regex stuff, it should work as well as it does for other compilers. The place that this code could go is into cmakexbuild.cxx which already strips out all lines that start with setenv. Maybe even some hard coded stuff that looks for errors and only puts those out. i.e. only output lines from cmakexbuild.cxx if there are errors. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/16/2011 4:11 PM, Sean McBride wrote: The downside is that this solution would be fragile. xcodebuild's output is not guaranteed to be the same forever, it's not like a public API. Already today, gcc and clang output pretty differently. I'm a little worried this would bite us. Besides, improving regex performance would be a win everywhere, not just in this case. Still, we are already filtering the output some, and it is way too verbose which is why it is the only place where this is a problem. It might be worth exploring as a faster path to getting things working for you. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
Hi, If it's of any help, I used the pcrecpp library by Google (it's part of PCRE). With pcrecpp, most operations were only 1-3 lines long. The only problem I found is PCRE provided no way to get the previous/next match, which CMake needs. On Tue, Nov 15, 2011 at 4:25 PM, Alexandru Ciobanu a...@rogue-research.com wrote: Hi Bill and Pau, I am currently working on adding PCRE to CMake. Chances are very hight that it will work, given the very similar comp()/exec() API calls in both implementations. I'll let you know about the results soon. Alex On 2011-11-14, at 10:31 PM, Bill Hoffman wrote: On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote: Bill, I think the current incarnation of regexps in CMake should be kept for compatibility reasons. Yes, of course. Adding PCRE is not difficult, just time consuming. The implementation I'd do would be an additional abstraction layer: - For the current BRE implementation, it would be a 1:1 call match - For the PCRE implementation, it would keep match status, count, next/previous iterators, etc. So, for this case I would be interested to here from Alex to see if swapping out the regex will fix the ctest performance issue. It is a nice isolated place to give PCRE a try. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers -- Pau Garcia i Quiles http://www.elpauer.org (Due to my workload, I may need 10 days to answer) -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
Bill, I think the current incarnation of regexps in CMake should be kept for compatibility reasons. Adding PCRE is not difficult, just time consuming. The implementation I'd do would be an additional abstraction layer: - For the current BRE implementation, it would be a 1:1 call match - For the PCRE implementation, it would keep match status, count, next/previous iterators, etc. On Mon, Nov 14, 2011 at 7:30 PM, Bill Hoffman bill.hoff...@kitware.comwrote: Sorry for the top post... However, if the issue with ctest being slow can be fixed by using PCRE in CMake, that is good news. We can just link in the library, and replace that small part of CMake internal code that has the performance problem. This should not break backwards compatibility. It also gives us a way to slowly bring in PCRE into CMake. Alex, is there a way you can try PCRE in CMake to see if it fixes the problem? -Bill On 11/14/2011 1:13 PM, Pau Garcia i Quiles wrote: Hi, Check this: A wish a day 11: Perl Compatible Regular Expressions in CMake http://www.elpauer.org/?p=684 Unfortunately the student turned out to be a total fraud: he knew nothing about CMake, regular expressions (much less PCRE!), git, and could barely manage with C/C++. After months of explaining *really* basic stuff (such as the difference between a static and a shared library), he silently gave up. I do have an initial implementation and extensive information on how to implement PCRE in CMake. It's just I don't have enough spare time to do that, and at work I cannot justify investing so many time in CMake for free (for now, we don't need advanced regular expressions) On Mon, Nov 14, 2011 at 6:57 PM, Alexandru Ciobanu a...@rogue-research.com mailto:alex@rogue-research.**coma...@rogue-research.com wrote: Hi, Our team is affected by issue 0012381, that causes extremely poor performance by CTest. Details here: http://public.kitware.com/Bug/**view.php?id=12381http://public.kitware.com/Bug/view.php?id=12381 I've created a small test case that demonstrates the problem. Please find the .cpp file attached. From what I see, the RegularExpression class uses Henry Spencer regex implementation, which is known to be slow for some cases. On my machine, the attached example runs in 0.8 sec. Just to process one string! $ time ./repr real 0m0.865s user 0m0.862s sys 0m0.002s Grep can process 100k such strings in 0.5 sec (which includes reading a 570MB file from disk): $ wc -l big.str.txt 10 big.str.txt $ ls -lh big.str.txt -rw-r--r-- 1 alex staff 572M 14 Nov 12:30 big.str.txt $ time grep ([^:]+): warning[ \t]*[0-9]+[ \t]*: big.str.txt real 0m0.525s user 0m0.255s sys 0m0.269s I see three ways to fix this problem: A) use a trusted 3rd party regex library, like re2 or pcre B) find another self-contained regex implementation C) try to use the standard POSIX regex available in regex.h on most systems I tried to find another self-contained regex implementation, that we could use. I found Tiny REX, but it is as slow, in this case, as Henry Spencer's implementation. So what do you think is the best way to proceed about this problem? sincerely, Alex Ciobanu -- Pau Garcia i Quiles http://www.elpauer.org (Due to my workload, I may need 10 days to answer) -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers
Re: [cmake-developers] slow regex implementation in RegularExpression
On 11/14/2011 6:08 PM, Pau Garcia i Quiles wrote: Bill, I think the current incarnation of regexps in CMake should be kept for compatibility reasons. Yes, of course. Adding PCRE is not difficult, just time consuming. The implementation I'd do would be an additional abstraction layer: - For the current BRE implementation, it would be a 1:1 call match - For the PCRE implementation, it would keep match status, count, next/previous iterators, etc. So, for this case I would be interested to here from Alex to see if swapping out the regex will fix the ctest performance issue. It is a nice isolated place to give PCRE a try. -Bill -- Powered by www.kitware.com Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html Please keep messages on-topic and check the CMake FAQ at: http://www.cmake.org/Wiki/CMake_FAQ Follow this link to subscribe/unsubscribe: http://public.kitware.com/cgi-bin/mailman/listinfo/cmake-developers