rmuir opened a new pull request, #14227:
URL: https://github.com/apache/lucene/pull/14227
string: `?+½]+]+Ř*+[\]ᖴﴁ.`
expected: before #14193
```
java.lang.IllegalArgumentException: expected ']' at position 17
```
actual: after #14193
```
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_CONCATENATION
REGEXP_REPEAT_MIN min=1
REGEXP_CHAR char=?
REGEXP_CHAR char=½
REGEXP_REPEAT_MIN min=1
REGEXP_CHAR char=]
REGEXP_CHAR char=
REGEXP_REPEAT_MIN min=1
REGEXP_CHAR char=]
REGEXP_REPEAT_MIN min=1
REGEXP_REPEAT
REGEXP_CHAR char=Ř
REGEXP_CHAR_CLASS starts=[] ends=[]
REGEXP_STRING string=ᖴﴁ
REGEXP_ANYCHAR
```
Problem is caused by RegExp accepting too much rather than throwing
exceptions like it should have. The lenience in the parser comes from
`expandPreDefined()` which invades on escape character parsing for character
classes (e.g. `\s`). This one adds a lot of complexity to parsing.
Don't invoke expandPreDefined(), except for the set of characters that it
explicitly handles. This is also consistent with the way expandPreDefined()'s
complexity is managed elsewhere in the parser, such as in `parseSimpleExp()`.
Add parsing tests for `testEmptyClass()`, which is unchanged by this PR, but
should be there, and `testEscapedInvalidClass()`, which fails without the
change.
Closes #14224
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]