[ https://issues.apache.org/jira/browse/JENA-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17144732#comment-17144732 ]
Jonas Sourlier commented on JENA-1918: -------------------------------------- Thank you Andy! > Bad performance of path sequence and path* > ------------------------------------------ > > Key: JENA-1918 > URL: https://issues.apache.org/jira/browse/JENA-1918 > Project: Apache Jena > Issue Type: Bug > Components: Jena > Affects Versions: Jena 3.15.0 > Reporter: Jonas Sourlier > Assignee: Andy Seaborne > Priority: Major > Fix For: Jena 3.16.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I want to execute the following SPARQL against my local Apache Jena (with > preloaded Wikidata dump using TDB2): > {code:java} > PREFIX wd: <http://www.wikidata.org/entity/> > PREFIX wdt: <http://www.wikidata.org/prop/direct/> > PREFIX wikibase: <http://wikiba.se/ontology#> > PREFIX p: <http://www.wikidata.org/prop/> > PREFIX ps: <http://www.wikidata.org/prop/statement/> > PREFIX pq: <http://www.wikidata.org/prop/qualifier/> > SELECT ?item ?outflow ?drainageBasin ?coordinates ?elevation ?country > > WHERE { > ?item wdt:P31/wdt:P279* wd:Q23397. > > OPTIONAL { ?item wdt:P201 ?outflow. } > OPTIONAL { ?item wdt:P4614 ?drainageBasin. } > OPTIONAL { ?item wdt:P625 ?coordinates. } > OPTIONAL { ?item wdt:P2044 ?elevation. } > OPTIONAL { ?item wdt:P17 ?country. } > } > > ORDER BY ?item LIMIT 1 OFFSET 0 > {code} > When run on query.wikidata.org (which uses Blazegraph), this query takes 26 > seconds to complete. Other queries run in about the same time as on > query.wikidata.org. > Apache Jena runs for several hours, using one CPU core and 3-4 GB of memory. > Then it runs into some timeout (the timeout might be increased, but that's > not the issue here). > My question is, why is this so much slower than Blazegraph? Can this SPARQL > be optimized to get a better performance? Can the query optimizer be tweaked > to run this more efficiently? > If not, then I consider this a bug, because the query itself should not > generate such a big workload. If the query optimizer runs the > {code:java} > wdt:P31/wdt:P279*{code} > predicate first, then limits it via the > {code:java} > ORDER BY ?item LIMIT 1 OFFSET 0{code} > clause, there would be just one item for which it needs to execute the > {code:java} > OPTIONAL { ?item ... }{code} > joins. -- This message was sent by Atlassian Jira (v8.3.4#803005)