I, personally, have not been able to figure out how to use the prospects table, 
so any insight there would be awesome. However, here are a couple of tips:


  1.  Move the most restrictive patterns as early as possible in the query. Rya 
doesn't do much (any?) reordering of clauses, so the sooner you can trim down 
the result set, the smaller your joins will be, which is the greatest time and 
memory impact on your query, e.g. don't lead with '?myThing a 
:OneOfManyThings', because evaluating that is expensive, and provides little 
value. Instead lead with '?myThing :rarePredicate "evenRarerValue"' and then 
refine
  2.  Narrow your result set as early as possible as well. If you have some 
variables only used for filtering, move them into a sub-query that only returns 
the projection you need, that way useless intermediates are returned earlier. 
This is more about space-efficiency than time efficiency but it does make a 
difference. Utilizing auto-generated blank nodes, which are not maintained 
after joins, also helps with this, e.g. instead of '?a :p1 ?c . ?c a 
:ObjectType; :p2 ?theImportantThing' you can use '?a :p1 [ a :ObjectType; :p2 
?theImportantThing]', or even '?a :p1/:p2 ?theImportantThing', because both of 
these strip ?c from the result set.
  3.  Use the query plan to see if any of your queries trigger a full table 
walk, as opposed to a narrow index range span. Jena can also provide you a 
query plan, try to look for discrepancies between the two - Jena may be smarter 
about reordering or utilizing better joins, and you might be able to achieve 
the same on Rya with a manual query refactor.
  4.  Utilize text search instead of 'FILTER(CONTAINS())', if that's applicable.


If you provide an example of your query, I would be happy to try and give more 
specific advice.

________________________________
From: Brian Vincent <brian.vinc...@polarisalpha.com>
Sent: Friday, March 1, 2019 1:32:18 AM
To: dev@rya.incubator.apache.org
Subject: Query time optimization

Hello,

My team and I are doing a comparison between Apache Jena and Apache Rya for RDF 
storage and we're seeing a significant query time difference (Rya being slower) 
in one of the queries.  (I'm using the Rya shell for performing the query, also 
have yet to run the prospects table script)  It's got me wondering if there's 
some common Accumulo or Rya tuning I can do?  I'm also curious about the 
prospects table and what kind difference you've found that to make?

Thanks so much!
Brian

Reply via email to