>From Murtadha Hubail <[email protected]>: Murtadha Hubail has submitted this change. ( https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21227?usp=email )
Change subject: [ASTERIXDB-3772][RT] Surface ASX1113 for invalid regex patterns ...................................................................... [ASTERIXDB-3772][RT] Surface ASX1113 for invalid regex patterns - user model changes: no - storage format changes: no - interface changes: no Details : Wrap Pattern.compile in RegExpMatcher.build with try/catch for PatternSyntaxException; rethrow as RuntimeDataException with the existing INVALID_REGEX_PATTERN(1113) error code. Previously, invalid regex patterns in regex_matches / regex_replace / regex_split / regex_position / regex_contains / regex_like (and their flag/offset variants) operator surfaced as generic "Internal error" (code 25000) with a leaked Java stack trace. The error code already existed in ErrorCode.INVALID_REGEX_PATTERN and was used in ExternalDataUtils.java for the same purpose; this patch wires up the runtime evaluator path to the same convention. Ext-ref: NO TICKET Change-Id: I09b5a2c5fd6fc2e845e88a702f0a4cca00d427da Reviewed-on: https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21227 Tested-by: Murtadha Hubail <[email protected]> Reviewed-by: Murtadha Hubail <[email protected]> Integration-Tests: Jenkins <[email protected]> --- A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp A asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp M asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml M asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java 6 files changed, 110 insertions(+), 1 deletion(-) Approvals: Jenkins: Verified Murtadha Hubail: Looks good to me, approved; Verified diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp new file mode 100644 index 0000000..e419198 --- /dev/null +++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.1.query.sqlpp @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// Unclosed character class: must surface ASX1113, not ASX25000. +SELECT ELEMENT a FROM [ + REGEXP_CONTAINS('abc', '[') +] AS a; diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp new file mode 100644 index 0000000..55d817a --- /dev/null +++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.2.query.sqlpp @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// Unclosed group: must surface ASX1113, not ASX25000. +SELECT ELEMENT a FROM [ + REGEXP_LIKE('abc', '(abc') +] AS a; diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp new file mode 100644 index 0000000..b7f860d --- /dev/null +++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.3.query.sqlpp @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// Dangling quantifier: must surface ASX1113, not ASX25000. +SELECT ELEMENT a FROM [ + REGEXP_REPLACE('abc', '*xyz', 'q') +] AS a; diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp new file mode 100644 index 0000000..6617164 --- /dev/null +++ b/asterixdb/asterix-app/src/test/resources/runtimets/queries_sqlpp/string/regexp_invalid_pattern_negative/regexp_invalid_pattern_negative.4.query.sqlpp @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +// Illegal character range (z before a): must surface ASX1113, not ASX25000. +SELECT ELEMENT a FROM [ + REGEXP_MATCHES('abc', '[z-a]') +] AS a; diff --git a/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml b/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml index 364881a..5e64cca 100644 --- a/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml +++ b/asterixdb/asterix-app/src/test/resources/runtimets/sqlpp_queries.xml @@ -11205,6 +11205,16 @@ </compilation-unit> </test-case> <test-case FilePath="string"> + <compilation-unit name="regexp_invalid_pattern_negative"> + <output-dir compare="Text">regexp_invalid_pattern_negative</output-dir> + <expected-error>ASX1113: Invalid pattern [</expected-error> + <expected-error>ASX1113: Invalid pattern (abc</expected-error> + <expected-error>ASX1113: Invalid pattern *xyz</expected-error> + <expected-error>ASX1113: Invalid pattern [z-a]</expected-error> + <source-location>false</source-location> + </compilation-unit> + </test-case> + <test-case FilePath="string"> <compilation-unit name="regexp_contains/regex_contains"> <output-dir compare="Text">regexp_contains/regex_contains</output-dir> </compilation-unit> diff --git a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java index 1a190cc..34d4f1a 100644 --- a/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java +++ b/asterixdb/asterix-runtime/src/main/java/org/apache/asterix/runtime/evaluators/functions/utils/RegExpMatcher.java @@ -21,7 +21,10 @@ import java.util.regex.Matcher; import java.util.regex.Pattern; +import java.util.regex.PatternSyntaxException; +import org.apache.asterix.common.exceptions.ErrorCode; +import org.apache.asterix.common.exceptions.RuntimeDataException; import org.apache.asterix.runtime.evaluators.functions.StringEvaluatorUtils; import org.apache.hyracks.api.exceptions.HyracksDataException; import org.apache.hyracks.data.std.primitive.UTF8StringPointable; @@ -126,7 +129,11 @@ // use whatever flags the previous pattern was using flags = pattern.flags(); } - pattern = Pattern.compile(patternString, flags); + try { + pattern = Pattern.compile(patternString, flags); + } catch (PatternSyntaxException ex) { + throw new RuntimeDataException(ErrorCode.INVALID_REGEX_PATTERN, ex, patternString); + } matcher = pattern.matcher(charSeq); } else { matcher.reset(charSeq); -- To view, visit https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/21227?usp=email To unsubscribe, or for help writing mail filters, visit https://asterix-gerrit.ics.uci.edu/settings?usp=email Gerrit-MessageType: merged Gerrit-Project: asterixdb Gerrit-Branch: lumina Gerrit-Change-Id: I09b5a2c5fd6fc2e845e88a702f0a4cca00d427da Gerrit-Change-Number: 21227 Gerrit-PatchSet: 21 Gerrit-Owner: Rithwik Koul <[email protected]> Gerrit-Reviewer: Anon. E. Moose #1000171 Gerrit-Reviewer: Jenkins <[email protected]> Gerrit-Reviewer: Murtadha Hubail <[email protected]>
