[jira] [Created] (JENA-1918) Bad performance when using ORDER BY

Jonas Sourlier (Jira) Wed, 17 Jun 2020 00:26:22 -0700

Jonas Sourlier created JENA-1918:
------------------------------------

             Summary: Bad performance when using ORDER BY
                 Key: JENA-1918
                 URL: https://issues.apache.org/jira/browse/JENA-1918
             Project: Apache Jena
          Issue Type: Bug
          Components: Jena
    Affects Versions: Jena 3.15.0
            Reporter: Jonas Sourlier



I want to execute the following SPARQL against my local Apache Jena (with 
preloaded Wikidata dump using TDB2):
{code:java}
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
SELECT ?item ?outflow ?drainageBasin ?coordinates ?elevation ?country
 
 WHERE {
 ?item wdt:P31/wdt:P279* wd:Q23397.
 
 OPTIONAL { ?item wdt:P201 ?outflow. }
 OPTIONAL { ?item wdt:P4614 ?drainageBasin. }
 OPTIONAL { ?item wdt:P625 ?coordinates. }
 OPTIONAL { ?item wdt:P2044 ?elevation. }
 OPTIONAL { ?item wdt:P17 ?country. }
 }
 
 ORDER BY ?item LIMIT 1 OFFSET 0
{code}
When run on query.wikidata.org (which uses Blazegraph), this query takes 26 
seconds to complete. Other queries run in about the same time as on 
query.wikidata.org.

Apache Jena runs for several hours, using one CPU core and 3-4 GB of memory. 
Then it runs into some timeout (the timeout might be increased, but that's not 
the issue here).

My question is, why is this so much slower than Blazegraph? Can this SPARQL be 
optimized to get a better performance? Can the query optimizer be tweaked to 
run this more efficiently?

If not, then I consider this a bug, because the query itself should not 
generate such a big workload. If the query optimizer runs the
{code:java}
wdt:P31/wdt:P279*{code}
predicate first, then limits it via the
{code:java}
ORDER BY ?item LIMIT 1 OFFSET 0{code}
clause, there would be just one item for which it needs to execute the
{code:java}
OPTIONAL { ?item ... }{code}
joins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (JENA-1918) Bad performance when using ORDER BY

Reply via email to