Re: Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-20 Thread Paul Eggert

On 3/20/24 20:57, arn...@skeeve.com wrote:

Note that I said EREs, which don't have to provide backreferences.


Ah, sorry, I missed that. Thanks for correcting me. In that case there 
should be a polynomial-time solution.





Re: Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-20 Thread arnold
Note that I said EREs, which don't have to provide backreferences.

Arnold

Paul Eggert  wrote:

> On 3/20/24 01:40, arn...@skeeve.com wrote:
> > It's possible to write a POSIX compliant matcher for EREs that doesn't
> > have such problems; I know someone doing it.
>
> I think matching POSIX regular expressions in polynomial time is 
> equivalent to proving P=NP, i.e., you'll win the Turing Award if you 
> actually pull it off. This is due to back-references.
>
> See, for example:
>
> Câmpeanu C, Santean N. On pattern expression languages. 2007. Proc 
> AuthMathA. https://cs.smu.ca/~nic.santean/art/regex.pdf
>
> For a more programmer-friendly summary of the problem, and the subset of 
> POSIX REs that you can match more quickly, see:
>
> Pinto PED, Lopes JPA, de Brito MAS. A polynomial-time regular 
> expressions implementation. 2017. Cadernos Do IME - Série Informática, 
> 37(Junho), 22–36. https://doi.org/10.12957/cadinf.2016.13778



Re: Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-20 Thread Paul Eggert

On 3/20/24 01:40, arn...@skeeve.com wrote:

It's possible to write a POSIX compliant matcher for EREs that doesn't
have such problems; I know someone doing it.


I think matching POSIX regular expressions in polynomial time is 
equivalent to proving P=NP, i.e., you'll win the Turing Award if you 
actually pull it off. This is due to back-references.


See, for example:

Câmpeanu C, Santean N. On pattern expression languages. 2007. Proc 
AuthMathA. https://cs.smu.ca/~nic.santean/art/regex.pdf


For a more programmer-friendly summary of the problem, and the subset of 
POSIX REs that you can match more quickly, see:


Pinto PED, Lopes JPA, de Brito MAS. A polynomial-time regular 
expressions implementation. 2017. Cadernos Do IME - Série Informática, 
37(Junho), 22–36. https://doi.org/10.12957/cadinf.2016.13778




Re: Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-20 Thread arnold
Hi.

Paul Eggert wrote:
> In glibc (and Gnulib) the regular-expression code has long been 
> maintained under the philosophy that the code cannot handle arbitrary 
> regular expressions. Any code that lets the user specify an arbitrary 
> regular expression is suspect, and this includes Awk scripts. (This is 
> also true for C libraries other than glibc/Gnulib.)
> 
> It'd be nice if someone could fix regex bugs like these in the glibc 
> regex code, but nobody has stepped forward to do that, and frankly it's 
> low priority. In the meantime, don't write Awk scripts with adversarial 
> regexps.

Thanks. This is more or less what I expected, and it's fine with me.
But I had to do my duty as gawk maintainer and forward the report.

Bruno Haible  wrote:
> Stack overflow inside the regex engine is only one of the problems. The
> other one is quadratic (or even exponential) running time. Such a running
> time can have fatal practical consequences [1]. The RE2 regex syntax [2]
> was designed to avoid such problems. But here, we are using POSIX regexes,
> which will always exhibit worst-case exponential running times.
>
> Bruno
>
> [1] https://blog.cloudflare.com/cloudflare-outage/
> [2] https://en.wikipedia.org/wiki/RE2_(software)

It's possible to write a POSIX compliant matcher for EREs that doesn't
have such problems; I know someone doing it.  In any case, users
get what they ask for, it's up to them to understand what they're doing.

Thanks,

Arnold
>



Re: Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-19 Thread Bruno Haible
Paul Eggert wrote:
> In glibc (and Gnulib) the regular-expression code has long been 
> maintained under the philosophy that the code cannot handle arbitrary 
> regular expressions. Any code that lets the user specify an arbitrary 
> regular expression is suspect, and this includes Awk scripts. (This is 
> also true for C libraries other than glibc/Gnulib.)
> 
> It'd be nice if someone could fix regex bugs like these in the glibc 
> regex code, but nobody has stepped forward to do that, and frankly it's 
> low priority. In the meantime, don't write Awk scripts with adversarial 
> regexps.

Stack overflow inside the regex engine is only one of the problems. The
other one is quadratic (or even exponential) running time. Such a running
time can have fatal practical consequences [1]. The RE2 regex syntax [2]
was designed to avoid such problems. But here, we are using POSIX regexes,
which will always exhibit worst-case exponential running times.

Bruno

[1] https://blog.cloudflare.com/cloudflare-outage/
[2] https://en.wikipedia.org/wiki/RE2_(software)





Re: Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-19 Thread Paul Eggert
In glibc (and Gnulib) the regular-expression code has long been 
maintained under the philosophy that the code cannot handle arbitrary 
regular expressions. Any code that lets the user specify an arbitrary 
regular expression is suspect, and this includes Awk scripts. (This is 
also true for C libraries other than glibc/Gnulib.)


It'd be nice if someone could fix regex bugs like these in the glibc 
regex code, but nobody has stepped forward to do that, and frankly it's 
low priority. In the meantime, don't write Awk scripts with adversarial 
regexps.




Forwarded: Segmentation Fault via recursive loop in Gawk

2024-03-19 Thread arnold
Hello.

Please see this report sent to the gawk list concerning regcomp.c.
I have attached his "POCFILE".

Thanks,

Arnold

> From: ttfish 
> Date: Tue, 19 Mar 2024 21:48:34 +0800
> Subject: Segmentation Fault via recursive loop in Gawk
> To: bug-g...@gnu.org
> Cc: secur...@gnu.org
>
> Content-Type: text/plain; charset="UTF-8"
>
> Dear GNU gawk developers,
>
> Greetings. I am writing to report a recursive loop bug found in gawk.
>
> ## Description
>
> The bug is located in the support/regcomp.c file within the parse_reg_exp
> function. The vulnerability involves function "parse_expression",
> "parse_branch" and "parse_sub_exp" and exists in latest stable release
> (gawk 5.3.0) and the latest master branch
> (ff873ce52bf6a1766935281883b74b49edc7d38f, updated on March 04, 2024). The
> inner variable of `preg`, `token`, `syntax` and `nest` would stick with
> unchanged values in loop calling and lead to segmentation fault.
>
> ## Proof of Concept
>
> The attached PoC could result segmentation fault and subsequent program
> termination.
>
> It could be reproduced by the attached PoC file with input:
>
> ```bash
> gawk -f POC-FILE {anyfile}
> ```
>
> The backtrace log could be found below:
>
> ```bash
> #4  0x006f3121 in parse_expression (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=4466, err=0x75c09b20) at ./regcomp.c:2242
> #5  0x006f243d in parse_branch (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=4466, err=0x75c09b20) at ./regcomp.c:2169
> #6  0x006ee668 in parse_reg_exp (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=4466, err=0x75c09b20) at ./regcomp.c:2121
> #7  0x006f4e72 in parse_sub_exp (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=4466, err=0x75c09b20) at ./regcomp.c:2456
> #8  0x006f3121 in parse_expression (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=4465, err=0x75c09b20) at ./regcomp.c:2242
> #9  0x006f243d in parse_branch (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=4465, err=0x75c09b20) at ./regcomp.c:2169
> #10 0x006ee668 in parse_reg_exp (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
>
> # repeat ...
>
> #17868 0x006f3121 in parse_expression (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=0, err=0x75c09b20) at ./regcomp.c:2242
> #17869 0x006f265a in parse_branch (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=0, err=0x75c09b20) at ./regcomp.c:2176
> #17870 0x006ee668 in parse_reg_exp (regexp=0x75c09b30,
> preg=0x50b1a920, token=0x75aca4e0, syntax=2339405,
> nest=0, err=0x75c09b20) at ./regcomp.c:2121
> #17871 0x006e6db2 in parse (regexp=0x75c09b30,
> preg=0x50b1a920, syntax=2339405, err=0x75c09b20)
> at ./regcomp.c:2089
> #17872 0x006dd100 in re_compile_internal (
> preg=0x50b1a920,
> pattern=0x52c10200
> "()\326*()+\\2+()*\\2\277()\326*))*\\W3^\\e<\"\003^*", '('  times>..., length=28345, syntax=2339405)
> at ./regcomp.c:764
> #17873 0x006dc5ca in re_compile_pattern (
> pattern=0x52c10200
> "()\326*()+\\2+()*\\2\277()\326*))*\\W3^\\e<\"\003^*", '('  times>..., length=28345,
> bufp=0x50b1a920) at ./regcomp.c:217
> #17874 0x006a4128 in make_regexp (
> s=0x52c08200
> "()\326*()+\\5342+()*\\5342\277()\326*))*\\W3^\\e<\"\003^*", '('  160 times>..., len=28345, ignorecase=false,
> dfa=true, canfatal=false) at re.c:257
> #17875 0x005944c4 in make_regnode (type=Node_regex,
> exp=0x52609720)
> at /home/ttfish/Project/2024/DSLFuzz/gawk/awkgram.y:5297
> #17876 0x005728a6 in yyparse ()
> at /home/ttfish/Project/2024/DSLFuzz/gawk/awkgram.y:572
> #17877 0x0059fe3d in parse_program (
> pcode=0x113d8a0 , from_eval=false)
> at /home/ttfish/Project/2024/DSLFuzz/gawk/awkgram.y:2803
> #17878 0x006783e8 in main (argc=4, argv=0x7fffd9c8)
> at main.c:504
> ```
>
> ## Impact
>
> This vulnerability allows attackers to cause a denial of service by
> crashing the gawk instance or malicious memory manipulation.
>
> ## Attachments
>
> Please find the attached PoC file in the attachment.
>
> Please feel free to contact me if you have any further questions.
>
> Best regards,
> ttfish



POCFILE
Description: Binary data