[ 
https://issues.apache.org/jira/browse/PHOENIX-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288670#comment-14288670
 ] 

Maryann Xue commented on PHOENIX-1179:
--------------------------------------

[~sunnychen] There're a few points to your problem:

1. It should be the RHS that is put into region server memory. So I suppose in 
your case, the size of "MAX_CT_STANDARD_TEST_TABLE1" has nothing to do with the 
heap size whatsoever.

2. But why did changing the LHS size actually make a difference?
Very likely what happened was that the scan of LHS took so long that the client 
had timed out and initiated a retry, which once again sent the RHS to the 
region servers. After many times of such overlapped retries, different versions 
of RHS co-existed in the region server memory, which led to the exception or RS 
crash.

So a general solution can be to increase the rpc timeout value for HBase 
client, in order to avoid these meaningless retries.

3. Based on my assumption that you SMALL table ids are just a small part of 
your BIG table ids, a much better solution is to use hint "SKIP_SCAN_HASH_JOIN" 
for the query. If my assumption stands, you will see big difference in query 
running time.

> Support many-to-many joins
> --------------------------
>
>                 Key: PHOENIX-1179
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1179
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: James Taylor
>            Assignee: Maryann Xue
>             Fix For: 4.3, 3.3
>
>         Attachments: 1179.patch
>
>
> Enhance our join capabilities to support many-to-many joins where the size of 
> both sides of the join are too big to fit into memory (and thus cannot use 
> our hash join mechanism). One technique would be to order both sides of the 
> join by their join key and merge sort the results on the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to