I, personally, have not been able to figure out how to use the prospects table, so any insight there would be awesome. However, here are a couple of tips:
1. Move the most restrictive patterns as early as possible in the query. Rya doesn't do much (any?) reordering of clauses, so the sooner you can trim down the result set, the smaller your joins will be, which is the greatest time and memory impact on your query, e.g. don't lead with '?myThing a :OneOfManyThings', because evaluating that is expensive, and provides little value. Instead lead with '?myThing :rarePredicate "evenRarerValue"' and then refine 2. Narrow your result set as early as possible as well. If you have some variables only used for filtering, move them into a sub-query that only returns the projection you need, that way useless intermediates are returned earlier. This is more about space-efficiency than time efficiency but it does make a difference. Utilizing auto-generated blank nodes, which are not maintained after joins, also helps with this, e.g. instead of '?a :p1 ?c . ?c a :ObjectType; :p2 ?theImportantThing' you can use '?a :p1 [ a :ObjectType; :p2 ?theImportantThing]', or even '?a :p1/:p2 ?theImportantThing', because both of these strip ?c from the result set. 3. Use the query plan to see if any of your queries trigger a full table walk, as opposed to a narrow index range span. Jena can also provide you a query plan, try to look for discrepancies between the two - Jena may be smarter about reordering or utilizing better joins, and you might be able to achieve the same on Rya with a manual query refactor. 4. Utilize text search instead of 'FILTER(CONTAINS())', if that's applicable. If you provide an example of your query, I would be happy to try and give more specific advice. ________________________________ From: Brian Vincent <brian.vinc...@polarisalpha.com> Sent: Friday, March 1, 2019 1:32:18 AM To: dev@rya.incubator.apache.org Subject: Query time optimization Hello, My team and I are doing a comparison between Apache Jena and Apache Rya for RDF storage and we're seeing a significant query time difference (Rya being slower) in one of the queries. (I'm using the Rya shell for performing the query, also have yet to run the prospects table script) It's got me wondering if there's some common Accumulo or Rya tuning I can do? I'm also curious about the prospects table and what kind difference you've found that to make? Thanks so much! Brian