Re: [go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-03 Thread Artur Vianna
Oops sorry, Powerset algo is actually O(n*2^n) time complexity

On Sat, 3 Apr 2021, 13:49 Artur Vianna,  wrote:

> An interesting approach would be the implementation of a regex engine that
> has optimizations just like a compiler (i think the stdlib one does that to
> an extent). Using powerset and hopcroft's where it can be used and leaving
> backtracks where it can't.
>
> That would be fun to implement but very confusing for the user and has the
> same security problems of a NFA backtracking regex, but the user would be
> able to choose the backtracking features, being aware that they pose
> security problems.
>
> Powerset algorithm is O(n²) too, so while converting the NFA to DFA some
> strings like (a*b*)^n can take forever basically, a possible security
> problem if your regex came from user input.
>
> On Sat, 3 Apr 2021, 13:31 bobj...@gmail.com,  wrote:
>
>> In *my* ideal world, both O(n) and backtracking implementations would be
>> available in the standard library.
>>
>> Most popular languages have only backtracking versions in their standard
>> libraries (Java, Python, Ruby). It's nice that Go has an O(n)
>> implementation. But, some regexes that can be written in the modern
>> backtracking implementations that support "atomic groups", etc. just cannot
>> be rewritten for O(n) implementations, sometimes resulting in having to
>> write a lot of code to achieve the equivalent result. The reason that O(n)
>> implementations lack those features is that they simply can't be done
>> without violating O(n). So why not let the programmer exercise the choice
>> -- O(n) where possible for denial-of-service safety or maybe better
>> performance, and backtracking when O(n) can't be done or maximum
>> performance is not important, and regexes incoming from the wild do not
>> happen.
>>
>> And... I suspect that well written regexes for backtracking can be nearly
>> as performant if backtracking is kept under control via skillful use of
>> those atomic features. It's the sloppy ones that are the problem.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "golang-nuts" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to golang-nuts+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/golang-nuts/7afca073-a3ff-464c-8de4-8b2ba7c4d214n%40googlegroups.com
>> 
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAE%3DAWBUE4fFWWFOVzhF-WpMZZL6L15XCd-L3yAWj9k_CtkFr7Q%40mail.gmail.com.


Re: [go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-03 Thread Artur Vianna
An interesting approach would be the implementation of a regex engine that
has optimizations just like a compiler (i think the stdlib one does that to
an extent). Using powerset and hopcroft's where it can be used and leaving
backtracks where it can't.

That would be fun to implement but very confusing for the user and has the
same security problems of a NFA backtracking regex, but the user would be
able to choose the backtracking features, being aware that they pose
security problems.

Powerset algorithm is O(n²) too, so while converting the NFA to DFA some
strings like (a*b*)^n can take forever basically, a possible security
problem if your regex came from user input.

On Sat, 3 Apr 2021, 13:31 bobj...@gmail.com,  wrote:

> In *my* ideal world, both O(n) and backtracking implementations would be
> available in the standard library.
>
> Most popular languages have only backtracking versions in their standard
> libraries (Java, Python, Ruby). It's nice that Go has an O(n)
> implementation. But, some regexes that can be written in the modern
> backtracking implementations that support "atomic groups", etc. just cannot
> be rewritten for O(n) implementations, sometimes resulting in having to
> write a lot of code to achieve the equivalent result. The reason that O(n)
> implementations lack those features is that they simply can't be done
> without violating O(n). So why not let the programmer exercise the choice
> -- O(n) where possible for denial-of-service safety or maybe better
> performance, and backtracking when O(n) can't be done or maximum
> performance is not important, and regexes incoming from the wild do not
> happen.
>
> And... I suspect that well written regexes for backtracking can be nearly
> as performant if backtracking is kept under control via skillful use of
> those atomic features. It's the sloppy ones that are the problem.
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/7afca073-a3ff-464c-8de4-8b2ba7c4d214n%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAE%3DAWBW9PaP%2BA%2BnKsBfGQgDWLJ73y%3DBpHyP%2B_r%2BPHrOvODDwqA%40mail.gmail.com.


[go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-03 Thread bobj...@gmail.com
In *my* ideal world, both O(n) and backtracking implementations would be 
available in the standard library.

Most popular languages have only backtracking versions in their standard 
libraries (Java, Python, Ruby). It's nice that Go has an O(n) 
implementation. But, some regexes that can be written in the modern 
backtracking implementations that support "atomic groups", etc. just cannot 
be rewritten for O(n) implementations, sometimes resulting in having to 
write a lot of code to achieve the equivalent result. The reason that O(n) 
implementations lack those features is that they simply can't be done 
without violating O(n). So why not let the programmer exercise the choice 
-- O(n) where possible for denial-of-service safety or maybe better 
performance, and backtracking when O(n) can't be done or maximum 
performance is not important, and regexes incoming from the wild do not 
happen.

And... I suspect that well written regexes for backtracking can be nearly 
as performant if backtracking is kept under control via skillful use of 
those atomic features. It's the sloppy ones that are the problem.

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/7afca073-a3ff-464c-8de4-8b2ba7c4d214n%40googlegroups.com.


[go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-02 Thread Amnon
*Some people, when confronted with a problem, think "I know, I'll use 
regular expressions." Now they have two problems.*

I would tend to agree with this statement. Regular expressions are much 
abused and over-used in situations where 
simpler parsing techniques would be more appropriate. Beyond a certain 
length regular expressions are hard to read,
and easy to get wrong.

But when you do need to use a regular expression, then it the 
implementation in the standard library is usually a good choice.
Its most important feature is O(n) runtime - runtime proportional to the 
input length, irrespective of the complexity of 
the regular expression. This prevents denial of service attacks based on 
sending regular expression which cause 
catastrophic backtracking and exponential runtime. The standard library 
omits some of the bells and whistles 
provided by PCRE and other libraries - but often for the good reason that 
these require backtracking, so are impossible
to implement while guaranteeing reasonable runtimes. 

So I would certainly consider using the standard library version, and 
rewriting your regular expressions to remove the use
of features not supported in the standard library.
On Friday, 2 April 2021 at 01:43:00 UTC+1 dlc...@gmail.com wrote:

> You can try my regexp2 engine that's based on the .NET engine: 
> https://github.com/dlclark/regexp2
>
> It supports atomic groups, has no dependencies, and doesn't need cgo. 
> Unfortunately it doesn't support Possessive Quantifier syntax, but 
> supposedly that may be serving as syntactic sugar for atomic groups and 
> look-ahead assertions (which it does support).
>
> Thanks,
> Doug
>
> On Thursday, April 1, 2021 at 2:54:35 PM UTC-5 bobj...@gmail.com wrote:
>
>> I have a use of regular expressions where I need the "atomic" matching 
>> features that I can get for several other languages, such as atomic 
>> matching groups and "possessive" quantifiers. I'm willing to forego the 
>> blazing performance of Go's standard library dfa matcher for this 
>> particular case. I cannot write an equivalent regexp for Go's "regexp", and 
>> the additional code I would have to write for equivalent functionality is 
>> prohibitive.
>>
>> I did find such an implementation in the ether, importable as "
>> github.com/h2so5/goback/regexp", It kind of works, but fails for my use 
>> case.
>>
>> Is there any reason to believe that downloaded modules such as this are 
>> well-tested and reliable?
>>
>> To pursue this problem,, I wrote small programs that perform the same 
>> operation in isolation in some other languages that have regular expression 
>> implementations the have the features I want, in particular
>>Java, whose java.util.regex module provides the desired features
>>Python, which has a module "regex" downloadable from the PyPI (Python 
>> program index) that replaces the standard "re" module but has my desired 
>> features
>>Also Go, using the downloaded module I mentioned
>>
>> Only the Go version fails. Of course, it's the fault of the downloaded 
>> library, not Go itself :)
>>
>> Anyone know of any other Go nfa implementations with the aforementioned 
>> features?
>>
>> As much as I appreciate the advantages Go's dfa "regexp" module, there 
>> are times when a more full-featured nfa implementation is useful. (And, a 
>> well-written regexp for nfa can be pretty fast if the aforementioned 
>> features are used to prevent excessive backtracking.)
>>
>> If anyone is interested in looking into the failure I experienced, I'd be 
>> happy to send the little programs I wrote to investigate this.
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/d08ed242-fdef-4954-ab66-c1a7c479b9a6n%40googlegroups.com.


[go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-01 Thread Doug
You can try my regexp2 engine that's based on the .NET engine: 
https://github.com/dlclark/regexp2

It supports atomic groups, has no dependencies, and doesn't need cgo. 
Unfortunately it doesn't support Possessive Quantifier syntax, but 
supposedly that may be serving as syntactic sugar for atomic groups and 
look-ahead assertions (which it does support).

Thanks,
Doug

On Thursday, April 1, 2021 at 2:54:35 PM UTC-5 bobj...@gmail.com wrote:

> I have a use of regular expressions where I need the "atomic" matching 
> features that I can get for several other languages, such as atomic 
> matching groups and "possessive" quantifiers. I'm willing to forego the 
> blazing performance of Go's standard library dfa matcher for this 
> particular case. I cannot write an equivalent regexp for Go's "regexp", and 
> the additional code I would have to write for equivalent functionality is 
> prohibitive.
>
> I did find such an implementation in the ether, importable as "
> github.com/h2so5/goback/regexp", It kind of works, but fails for my use 
> case.
>
> Is there any reason to believe that downloaded modules such as this are 
> well-tested and reliable?
>
> To pursue this problem,, I wrote small programs that perform the same 
> operation in isolation in some other languages that have regular expression 
> implementations the have the features I want, in particular
>Java, whose java.util.regex module provides the desired features
>Python, which has a module "regex" downloadable from the PyPI (Python 
> program index) that replaces the standard "re" module but has my desired 
> features
>Also Go, using the downloaded module I mentioned
>
> Only the Go version fails. Of course, it's the fault of the downloaded 
> library, not Go itself :)
>
> Anyone know of any other Go nfa implementations with the aforementioned 
> features?
>
> As much as I appreciate the advantages Go's dfa "regexp" module, there are 
> times when a more full-featured nfa implementation is useful. (And, a 
> well-written regexp for nfa can be pretty fast if the aforementioned 
> features are used to prevent excessive backtracking.)
>
> If anyone is interested in looking into the failure I experienced, I'd be 
> happy to send the little programs I wrote to investigate this.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/42ac52d7-2a7c-4ad7-93ff-4a9be447f463n%40googlegroups.com.


Re: [go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-01 Thread Marcin Romaszewicz
I have nothing useful to contribute to this discussion, however, I've been
doing this long enough to remember Jamie Zawinski's quote about regular
expressions :)

Some people, when confronted with a problem, think "I know, I'll use
> regular expressions." Now they have two problems.
>

https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/

On Thu, Apr 1, 2021 at 3:36 PM Amnon  wrote:

> Worth mentioning Russ Cox's series of blog posts on Regular Expressions
> and their implementation https://swtch.com/~rsc/regexp/
>
>
>
> On Thursday, 1 April 2021 at 21:21:14 UTC+1 Brian Candler wrote:
>
>> If you can live with cgo dependencies, then you might want to look at a
>> go wrapper around PCRE, which is the "go-to" implementation for
>> fully-featured regexps.
>>
>> e.g. https://github.com/rubrikinc/go-pcre  (untested by me, and it's one
>> of several forks)
>>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/golang-nuts/bb9db3d5-f9da-4324-80bf-48fbcc43b29en%40googlegroups.com
> 
> .
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CA%2Bv29LvukDp5KiR9HSzamqCW_RWBMT-dx8B%2Bh8fdB8kUu9vPwQ%40mail.gmail.com.


[go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-01 Thread Amnon
Worth mentioning Russ Cox's series of blog posts on Regular Expressions 
and their implementation https://swtch.com/~rsc/regexp/



On Thursday, 1 April 2021 at 21:21:14 UTC+1 Brian Candler wrote:

> If you can live with cgo dependencies, then you might want to look at a go 
> wrapper around PCRE, which is the "go-to" implementation for fully-featured 
> regexps.
>
> e.g. https://github.com/rubrikinc/go-pcre  (untested by me, and it's one 
> of several forks)
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/bb9db3d5-f9da-4324-80bf-48fbcc43b29en%40googlegroups.com.


[go-nuts] Re: Alternate implementations of regular expression matching?

2021-04-01 Thread Brian Candler
If you can live with cgo dependencies, then you might want to look at a go 
wrapper around PCRE, which is the "go-to" implementation for fully-featured 
regexps.

e.g. https://github.com/rubrikinc/go-pcre  (untested by me, and it's one of 
several forks)

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/46c4c051-1355-4602-903b-6be32626d8d1n%40googlegroups.com.