Andriy Redko created LUCENE-10642:
-------------------------------------
Summary: Regexp query: escape sequences are treated as character
classes
Key: LUCENE-10642
URL: https://issues.apache.org/jira/browse/LUCENE-10642
Project: Lucene - Core
Issue Type: Bug
Affects Versions: 9.0
Reporter: Andriy Redko
Interesting issue has been reported to Opensearch project [1], which has been
caused by [2], [3]. In the nutshell, the regression is causing escape sequences
(like \n, \r, \t, ...) to be treated as character classes (specifically,
[https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html#bs).]
The problematic function is RegExp::matchPredefinedCharacterClass which does
not consider characters that denote an escaped construct.
Simple test to reproduce which fails with
IllegalArgumentException("{color:#0451a5}invalid character class{color}"):
```
public class TestRegexpQuery extends LuceneTestCase {
public void testEscapeSequences() throws IOException {
assertEquals(1, regexQueryNrHits("\\n"));
assertEquals(1, regexQueryNrHits("[\\n]"));
}
}
```
[1] [https://github.com/opensearch-project/OpenSearch/issues/3781]
[2]
https://github.com/apache/lucene/commit/1efce5444dd40142c55c5a3a30eeebc7b86796c3
[3]
https://github.com/apache/lucene/commit/819e668ce2fcfcf86b652a191cdbe0fad0a8ffce
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]