There are two cases: I'd prefer to leave the static case as a syntax error and improve the dynamic case.

1/ If its a fixed pattern, it is compiled once.
An error here should become a "syntax error" to
E_Regex.init via makeRegexEngine.

This is a common case.

2/ If a dynamic pattern, a ExprEvalException (it has no stacktrace because fillInStackTrace is overridden for efficiency) would be better.

So throwing a ExprEvalException, catching in E_Regex.init and then throwing ExprException to become a syntax error as currently.

In addition, in (2), Log.warnOnce would be good.

    Andy

On 24/03/17 11:55, chris dollin wrote:
Dear All

When tracking down the cause of a disc overflow
on our Fuseki servers [0], we found that the log
files were full of stack dumps caused by queries
containing non-constant [1] illegal regular expression
patterns.

In a query like

    SELECT * WHERE {?s ?p ?o FILTER(REGEX(?o, ?o + ")DEF", "")}

where the pattern part of the regex is not constant,
the regex is compiled (in RegexJava) when the filter is
evaluated against the solution bindings. If the pattern
is syntactically illegal, a PatternSyntaxException is
thrown, caught by RegexJava in makePattern, which then
thows an ExprException.

The ExprException is caught in QueryIterFilterExpr.accept()
and a log entry generated, including the (large) stack
trace, and the filter is treated as FALSE so no solution
is generated.

When there are many -- say millions -- of results to filter
this generates millions of unhelpful stack traces,
[unhelpful because it confuses the evaluation of the
SPARQL query with the execution of the Java ARQ code]
and the disc fills up with logfile.

This is of course an edge case but we'd like to
avoid being cut so badly by the edge. Any other
filter expression that can throw EvalException will
have similar behaviour.

At the very least we'd like to be able to reduce the
size of each log message by omitting the stack trace [2].
But when there are millions of results to filter,
there will still be millions of lines going to the
logs.

An alternative would be to limit the number of warning log
entries generated for any given filter.

If the pattern part of a regex filter is treated as part
of the query rather than part of the data, even in the
case above that we started with, then a syntax error in
the pattern could be regarded as an error in the query,
and the entire query execution abandoned. Making this
spec-compatible might be tricky, though.

Thoughts?

Chris

[0] Fuseki 2.4.1

[1] When the pattern is a constant, which it will
    usually be, it is compiled once before the query
    runs and generates a single different error.

[2] One way to do this would be to have RegexJava.makePattern
    throw an ExprEvalException rather than an ExprException;
    we note empirically that an ExprEvalException generates
    a log entry that doesn't contain a stack trace. Or
    we could introduce additional specific exception types.


Reply via email to