>From Murtadha Hubail <[email protected]>:

Murtadha Hubail has submitted this change. ( 
https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21227?usp=email )

Change subject: [ASTERIXDB-3772][RT] Surface ASX1113 for invalid regex patterns
......................................................................

[ASTERIXDB-3772][RT] Surface ASX1113 for invalid regex patterns

- user model changes: no
- storage format changes: no
- interface changes: no

Details : Wrap Pattern.compile in RegExpMatcher.build with try/catch for
PatternSyntaxException; rethrow as RuntimeDataException with the existing
INVALID_REGEX_PATTERN(1113) error code. Previously, invalid regex patterns
in regex_matches / regex_replace / regex_split / regex_position /
regex_contains / regex_like (and their flag/offset variants) operator surfaced 
as generic "Internal error" (code 25000) with
a leaked Java stack trace. The error code already existed in
ErrorCode.INVALID_REGEX_PATTERN and was used in ExternalDataUtils.java
for the same purpose; this patch wires up the runtime evaluator path to
the same convention.

Ext-ref: NO TICKET
Change-Id: I09b5a2c5fd6fc2e845e88a702f0a4cca00d427da
Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21227
Tested-by: Murtadha Hubail <[email protected]>
Reviewed-by: Murtadha Hubail <[email protected]>
Integration-Tests: Jenkins <[email protected]>
---
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp
A 
asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp
M asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml
M 
asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java
6 files changed, 110 insertions(+), 1 deletion(-)

Approvals:
  Jenkins: Verified
  Murtadha Hubail: Looks good to me, approved; Verified




diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp
new file mode 100644
index 0000000..e419198
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// Unclosed character class: must surface ASX1113, not ASX25000.
+SELECT ELEMENT a FROM [
+  REGEXP_CONTAINS('abc', '[')
+] AS a;
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp
new file mode 100644
index 0000000..55d817a
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// Unclosed group: must surface ASX1113, not ASX25000.
+SELECT ELEMENT a FROM [
+  REGEXP_LIKE('abc', '(abc')
+] AS a;
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp
new file mode 100644
index 0000000..b7f860d
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// Dangling quantifier: must surface ASX1113, not ASX25000.
+SELECT ELEMENT a FROM [
+  REGEXP_REPLACE('abc', '*xyz', 'q')
+] AS a;
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp
 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp
new file mode 100644
index 0000000..6617164
--- /dev/null
+++ 
b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp
@@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+// Illegal character range (z before a): must surface ASX1113, not ASX25000.
+SELECT ELEMENT a FROM [
+  REGEXP_MATCHES('abc', '[z-a]')
+] AS a;
diff --git 
a/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml 
b/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml
index 364881a..5e64cca 100644
--- a/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml
+++ b/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml
@@ -11205,6 +11205,16 @@
       </compilation-unit>
     </test-case>
     <test-case FilePath="string">
+      <compilation-unit name="regexp_invalid_pattern_negative">
+        <output-dir compare="Text">regexp_invalid_pattern_negative</output-dir>
+        <expected-error>ASX1113: Invalid pattern [</expected-error>
+        <expected-error>ASX1113: Invalid pattern (abc</expected-error>
+        <expected-error>ASX1113: Invalid pattern *xyz</expected-error>
+        <expected-error>ASX1113: Invalid pattern [z-a]</expected-error>
+        <source-location>false</source-location>
+      </compilation-unit>
+    </test-case>
+    <test-case FilePath="string">
       <compilation-unit name="regexp_contains/regex_contains">
         <output-dir compare="Text">regexp_contains/regex_contains</output-dir>
       </compilation-unit>
diff --git 
a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java
 
b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java
index 1a190cc..34d4f1a 100644
--- 
a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java
+++ 
b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java
@@ -21,7 +21,10 @@

 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
+import java.util.regex.PatternSyntaxException;

+import org.apache.asterix.common.exceptions.ErrorCode;
+import org.apache.asterix.common.exceptions.RuntimeDataException;
 import org.apache.asterix.runtime.evaluators.functions.StringEvaluatorUtils;
 import org.apache.hyracks.api.exceptions.HyracksDataException;
 import org.apache.hyracks.data.std.primitive.UTF8StringPointable;
@@ -126,7 +129,11 @@
                 // use whatever flags the previous pattern was using
                 flags = pattern.flags();
             }
-            pattern = Pattern.compile(patternString, flags);
+            try {
+                pattern = Pattern.compile(patternString, flags);
+            } catch (PatternSyntaxException ex) {
+                throw new 
RuntimeDataException(ErrorCode.INVALID_REGEX_PATTERN, ex, patternString);
+            }
             matcher = pattern.matcher(charSeq);
         } else {
             matcher.reset(charSeq);

--
To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21227?usp=email
To unsubscribe, or for help writing mail filters, visit 
https://asterix-gerrit.ics.uci.edu/settings?usp=email

Gerrit-MessageType: merged
Gerrit-Project: asterixdb
Gerrit-Branch: lumina
Gerrit-Change-Id: I09b5a2c5fd6fc2e845e88a702f0a4cca00d427da
Gerrit-Change-Number: 21227
Gerrit-PatchSet: 21
Gerrit-Owner: Rithwik Koul <[email protected]>
Gerrit-Reviewer: Anon. E. Moose #1000171
Gerrit-Reviewer: Jenkins <[email protected]>
Gerrit-Reviewer: Murtadha Hubail <[email protected]>

Reply via email to