John Phillips created PHOENIX-5239:
--------------------------------------

             Summary: Send persistent subquery cache to all regionservers
                 Key: PHOENIX-5239
                 URL: https://issues.apache.org/jira/browse/PHOENIX-5239
             Project: Phoenix
          Issue Type: Improvement
            Reporter: John Phillips


PHOENIX-4666 introduced a persistent subquery cache that allowed phoenix to 
cache the results from an expensive subquery (enabled with a 
{{USE_PERSISTENT_CACHE}} query hint) to speed up subsequent queries.

More context is available on the PHOENIX-4666 ticket, but a quick example would 
be a query like:
{code:java}
SELECT /*+ USE_PERSISTENT_CACHE */ *
    FROM table1
    JOIN (SELECT id_1 FROM large_table WHERE x = 10) expensive_result
    ON table1.id_1 = expensive_result.id_2
WHERE table1.id_1 = [some_id]
{code}
Where lots of queries are ran, differing only by {{some_id}}. Our usage 
involves first running one query over phoenix to warm the cache (which takes 
~20 seconds), then once complete, allowing the live query to run which utilize 
the persistent subquery cache (~100ms).

However, we noticed that when phoenix sends the cache to the regionservers, it 
looks at {{some_id}} in the outer query to figure out which regionservers might 
contain {{table1.id_1 = [some_id]}} ([code 
here|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]).
 This means that when we first start running the query, we'll inconsistently 
hit the cache until it ends up being propagated to all the regionservers.

Basically, we'd like to have some way to warm the subquery cache and ensure 
it's on all the regionservers so subsequent queries will always find the cache. 
I think the simplest solution might be updating the [if statement in 
ServerCacheClient#addServerCache|https://github.com/apache/phoenix/blob/2084a6c/phoenix-core/src/main/java/org/apache/phoenix/cache/ServerCacheClient.java#L282-L283]
 to simply always send the cache to all the regionservers if it's a persistent 
subquery:
{code:java}
- if ( ! servers.contains(entry) &&
-         keyRanges.intersectRegion(regionStartKey, regionEndKey,
-                 cacheUsingTable.getIndexType() == IndexType.LOCAL)) {
+ boolean keyRangesIntersect = keyRanges.intersectRegion(regionStartKey, 
regionEndKey,
+         cacheUsingTable.getIndexType() == IndexType.LOCAL);
+ if (!servers.contains(entry) && (keyRangesIntersect || usePersistentCache)) {
{code}
I tested this out, and it seems to work as expected. If it sounds like an 
acceptable solution, I'd be happy to make an actual PR. Or, if anyone has any 
other suggestions on better ways to handle this, it would be much appreciated.

FYI [~jamestaylor], [~elserj], and [~maryannxue] since it looks like you three 
handled most of the review on the [original persistent cache 
PR|https://github.com/apache/phoenix/pull/298]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to