[
https://issues.apache.org/jira/browse/JENA-615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13856380#comment-13856380
]
Andy Seaborne commented on JENA-615:
------------------------------------
That would be a good pattern if it can be optimized.
Since 2.11.0, the filter placement has been improved which helps this sort of
pattern when {{ # Some patterns}} is complex.
Example:
{noformat}
PREFIX : <http://example/>
SELECT * {
?var :p ?o .
OPTIONAL {?var :q ?v }
FILTER(?var != <http://constant>)
}
{noformat}
used to give (2.11.0)
{noformat}
(prefix ((: <http://example/>))
(filter (!= ?var <http://constant>)
(conditional
(bgp (triple ?var :p ?o))
(bgp (triple ?var :q ?v)))))
{noformat}
and now gives (2.11.1 development):
{noformat}
(prefix ((: <http://example/>))
(conditional
(filter (!= ?var <http://constant>)
(bgp (triple ?var :p ?o)))
(bgp (triple ?var :q ?v))))
{noformat}
so it's putting the removal of `?var = <http://constant>` as early possible.
For some engines, like TDB, where the lowest level are not the Nodes themselves
but some internal id, then {{FILTER(?var != <http://constant>)}} naively needs
to get the string form of `?var`. It could work in reverse and find the
internal id for <http://constant> then do filtering on the internal ids, which
is much more efficient (no need to touch the node table) although as it's a !=
test, ?var might well be returned anyway so getting early may not matter too
much.
A special optimization is when {{<http://constant>}} is not in the data at all.
As ever, it comes down to timing different designs to see which tradeoffs works
best.
> Possible optimisation for FILTER(?var != <constant>)
> ----------------------------------------------------
>
> Key: JENA-615
> URL: https://issues.apache.org/jira/browse/JENA-615
> Project: Apache Jena
> Issue Type: Improvement
> Components: ARQ
> Reporter: Rob Vesse
> Assignee: Rob Vesse
> Priority: Minor
> Labels: algebra, optimization, sparql
>
> I have an idea for a possible optimisation for queries of the following
> general form:
> {noformat}
> SELECT *
> WHERE
> {
> # Some Patterns
> FILTER(?var != <http://constant>)
> }
> {noformat}
> This pattern crops up surprisingly often in real SPARQL workloads since it is
> often used to either limit a variable to exclude certain possibilities or to
> avoid self referential links in the data.
> In some cases it seems like this could be safely rewritten as follows:
> {noformat}
> SELECT *
> WHERE
> {
> # Some Patterns
> MINUS { BIND(<http://constant> AS ?var) }
> }
> {noformat}
> Or perhaps in a more generalised form like so:
> {noformat}
> SELECT * WHERE
> {
> # Some patterns
> MINUS { VALUES ?var { <http://constant/1> <http://constant/2> } }
> }
> {noformat}
> Which would nicely deal with the case of stating that a variable is not equal
> to multiple constant values.
> As I pointed out earlier this would not apply in every case, specifically I
> think at least the following must be true:
> - The variable must be guaranteed to be bound (similar to existing filter
> equality and implicit join optimisations)
> There is also the potential to spot cases where the variable will always be
> unbound and thus the expression is always an error and replace the entire
> sub-tree with {{table empty}} as we already do for equality and implicit join
> filters.
> I plan on taking a look at implementing this in the new year, if anyone has
> any thoughts on this (especially wrt to restrictions that should apply to
> when the optimisation is considered safe) then please comment.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)