I agree that for the case where the pattern is constant (passes isConstant ie is a string literal), your case 1/) it's best treated as a syntax error -- which is the current behaviour, so no change needed there.
In 2/, converting to a syntax error means (if I read the code correctly) that the first bad pattern will terminate the entire query. Is that compatible with the SPARQL spec? I looked at the relevant sections and left unsure. If the query is terminated immediately then there's no need to use Log.warnOnce? Chris On 29 March 2017 at 22:49, Andy Seaborne <[email protected]> wrote: > There are two cases: I'd prefer to leave the static case as a syntax error > and improve the dynamic case. > > 1/ If its a fixed pattern, it is compiled once. > An error here should become a "syntax error" to > E_Regex.init via makeRegexEngine. > > This is a common case. > > 2/ If a dynamic pattern, a ExprEvalException (it has no stacktrace because > fillInStackTrace is overridden for efficiency) would be better. > > So throwing a ExprEvalException, catching in E_Regex.init and then > throwing ExprException to become a syntax error as currently. > > In addition, in (2), Log.warnOnce would be good. > > Andy > > > On 24/03/17 11:55, chris dollin wrote: > >> Dear All >> >> When tracking down the cause of a disc overflow >> on our Fuseki servers [0], we found that the log >> files were full of stack dumps caused by queries >> containing non-constant [1] illegal regular expression >> patterns. >> >> In a query like >> >> SELECT * WHERE {?s ?p ?o FILTER(REGEX(?o, ?o + ")DEF", "")} >> >> where the pattern part of the regex is not constant, >> the regex is compiled (in RegexJava) when the filter is >> evaluated against the solution bindings. If the pattern >> is syntactically illegal, a PatternSyntaxException is >> thrown, caught by RegexJava in makePattern, which then >> thows an ExprException. >> >> The ExprException is caught in QueryIterFilterExpr.accept() >> and a log entry generated, including the (large) stack >> trace, and the filter is treated as FALSE so no solution >> is generated. >> >> When there are many -- say millions -- of results to filter >> this generates millions of unhelpful stack traces, >> [unhelpful because it confuses the evaluation of the >> SPARQL query with the execution of the Java ARQ code] >> and the disc fills up with logfile. >> >> This is of course an edge case but we'd like to >> avoid being cut so badly by the edge. Any other >> filter expression that can throw EvalException will >> have similar behaviour. >> >> At the very least we'd like to be able to reduce the >> size of each log message by omitting the stack trace [2]. >> But when there are millions of results to filter, >> there will still be millions of lines going to the >> logs. >> >> An alternative would be to limit the number of warning log >> entries generated for any given filter. >> >> If the pattern part of a regex filter is treated as part >> of the query rather than part of the data, even in the >> case above that we started with, then a syntax error in >> the pattern could be regarded as an error in the query, >> and the entire query execution abandoned. Making this >> spec-compatible might be tricky, though. >> >> Thoughts? >> >> Chris >> >> [0] Fuseki 2.4.1 >> >> [1] When the pattern is a constant, which it will >> usually be, it is compiled once before the query >> runs and generates a single different error. >> >> [2] One way to do this would be to have RegexJava.makePattern >> throw an ExprEvalException rather than an ExprException; >> we note empirically that an ExprEvalException generates >> a log entry that doesn't contain a stack trace. Or >> we could introduce additional specific exception types. >> >> >> -- "What I don't understand is this ..." Trevor Chaplin, /The Beiderbeck Affair/ Epimorphics Ltd, http://www.epimorphics.com Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT Epimorphics Ltd. is a limited company registered in England (number 7016688)
