[jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query

Marcell Ortutay (JIRA) Tue, 19 Jun 2018 14:04:07 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517546#comment-16517546
 ]


Marcell Ortutay commented on PHOENIX-4666:
------------------------------------------

[~jamestaylor] :

> Easiest would be to just let Phoenix rerun the portion of the query that 
> didn't find the hash cache. What's the reason this approach won't work?
> Otherwise, the top level method of BaseResultIterators managing the parallel 
> scans is submitWork(). Maybe you could do what you want from there? Or higher 
> up in the stack where the hash cache is being created?
 
I haven't been able to find a place to put the exception handler that works. 
Below is some additional info:
 
The hash join cache is generated in HashJoinPlan.iterator() method. This is the 
stack trace from that location:
{code:java}
org.apache.phoenix.execute.HashJoinPlan.iterator(HashJoinPlan.java:182)
org.apache.phoenix.execute.DelegateQueryPlan.iterator(DelegateQueryPlan.java:144)
org.apache.phoenix.execute.DelegateQueryPlan.iterator(DelegateQueryPlan.java:139)
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:316)
org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:295)
org.apache.phoenix.call.CallRunner.run(CallRunner.java:53)
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:294)
org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:286)
org.apache.phoenix.jdbc.PhoenixStatement.execute(PhoenixStatement.java:1838)
sqlline.Commands.execute(Commands.java:822)
sqlline.Commands.sql(Commands.java:732)
sqlline.SqlLine.dispatch(SqlLine.java:813)
sqlline.SqlLine.begin(SqlLine.java:686)
sqlline.SqlLine.start(SqlLine.java:398)
sqlline.SqlLine.main(SqlLine.java:291){code}
 However, the actual execution of the query comes later. This is where 
PersistentHashJoinCacheNotFoundException is thrown:
{code:java}
org.apache.phoenix.coprocessor.PersistentHashJoinCacheNotFoundException: ERROR 
900 (HJ01): Hash Join cache not found joinId: 7. The cache might have expired 
and have been removed.
at org.apache.phoenix.util.ServerUtil.parseRemoteException(ServerUtil.java:189)
at 
org.apache.phoenix.util.ServerUtil.parseServerExceptionOrNull(ServerUtil.java:174)
at org.apache.phoenix.util.ServerUtil.parseServerException(ServerUtil.java:141)
at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1327)
at 
org.apache.phoenix.iterate.BaseResultIterators.getIterators(BaseResultIterators.java:1245)
at 
org.apache.phoenix.iterate.RoundRobinResultIterator.getIterators(RoundRobinResultIterator.java:176)
at 
org.apache.phoenix.iterate.RoundRobinResultIterator.next(RoundRobinResultIterator.java:91)
at org.apache.phoenix.jdbc.PhoenixResultSet.next(PhoenixResultSet.java:805)
at sqlline.BufferedRows.<init>(BufferedRows.java:37)
at sqlline.SqlLine.print(SqlLine.java:1660)
at sqlline.Commands.execute(Commands.java:833)
at sqlline.Commands.sql(Commands.java:732)
at sqlline.SqlLine.dispatch(SqlLine.java:813)
at sqlline.SqlLine.begin(SqlLine.java:686)
at sqlline.SqlLine.start(SqlLine.java:398)
at sqlline.SqlLine.main(SqlLine.java:291)
{code}
Looking at this, the first common ancestor of these two stack traces is 
`sqlline.Commands.execute`, which is outside Phoenix codebase.

I took a quick look at the sqlline code base also. It appears that the flow for 
generating the hash join cache is started here: 
[https://github.com/julianhyde/sqlline/blob/master/src/main/java/sqlline/Commands.java#L823]
 and then the flow which triggers the exception is started here: 
[https://github.com/julianhyde/sqlline/blob/master/src/main/java/sqlline/Commands.java#L834]
 . 

I'm not sure if there's a way to do this doing flow control via exception, 
though I might be missing something. Please let me know. I've experimented with 
putting the exception handler in various places, including the 
ParallelIterators.submitWork() at 
[https://github.com/apache/phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/iterate/ParallelIterators.java#L135]
 , but it does not re-run the hash join cache generation, as that is in a 
previous part of the query execution.

Guidance/advice appreciated ; FWIW the RPC check version does work as expected 
and improves our query runs substantially.

> Add a subquery cache that persists beyond the life of a query
> -------------------------------------------------------------
>
>                 Key: PHOENIX-4666
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4666
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Marcell Ortutay
>            Assignee: Marcell Ortutay
>            Priority: Major
>
> The user list thread for additional context is here: 
> [https://lists.apache.org/thread.html/e62a6f5d79bdf7cd238ea79aed8886816d21224d12b0f1fe9b6bb075@%3Cuser.phoenix.apache.org%3E]
> ----
> A Phoenix query may contain expensive subqueries, and moreover those 
> expensive subqueries may be used across multiple different queries. While 
> whole result caching is possible at the application level, it is not possible 
> to cache subresults in the application. This can cause bad performance for 
> queries in which the subquery is the most expensive part of the query, and 
> the application is powerless to do anything at the query level. It would be 
> good if Phoenix provided a way to cache subquery results, as it would provide 
> a significant performance gain.
> An illustrative example:
>     SELECT * FROM table1 JOIN (SELECT id_1 FROM large_table WHERE x = 10) 
> expensive_result ON table1.id_1 = expensive_result.id_2 AND table1.id_1 = 
> \{id}
> In this case, the subquery "expensive_result" is expensive to compute, but it 
> doesn't change between queries. The rest of the query does because of the 
> \{id} parameter. This means the application can't cache it, but it would be 
> good if there was a way to cache expensive_result.
> Note that there is currently a coprocessor based "server cache", but the data 
> in this "cache" is not persisted across queries. It is deleted after a TTL 
> expires (30sec by default), or when the query completes.
> This is issue is fairly high priority for us at 23andMe and we'd be happy to 
> provide a patch with some guidance from Phoenix maintainers. We are currently 
> putting together a design document for a solution, and we'll post it to this 
> Jira ticket for review in a few days.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4666) Add a subquery cache that persists beyond the life of a query

Reply via email to