[jira] [Comment Edited] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

James Taylor (JIRA) Thu, 26 Jan 2017 22:35:26 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842317#comment-15842317
 ]


James Taylor edited comment on PHOENIX-3271 at 1/27/17 6:33 AM:
----------------------------------------------------------------

I like your long term ideas, [~enis] (JIRA please?), but I think this patch is 
good in the near term. The timeouts should be prevented by our RenewLease 
client side impl (by [~samarthjain]). If HBase let us renew leases on the 
server side, that'd be an improvement, but what we have works.

IMHO, having a safeguard config would lead to code duplication and make 
maintenance harder. I think we're ok without it (provided we do adequate 
testing). This patch should improve global index build times substantially.




was (Author: jamestaylor):
I like your long term ideas, [~enis] (JIRA please?), but I think this oatch is 
good in the near term. The timeouts should be prevented by our RenewLease 
client side impl (by [~samarthjain]). If HBase let us renew leases on the 
server side, that'd be an improvement, but what we have works.

IMHO, having a safeguard config would lead to code duplication and make 
maintenance harder. I think we're ok without it (provided we do adequate 
testing). This patch should improve global index build times substantially.



> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
>                 Key: PHOENIX-3271
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch, 
> PHOENIX-3271_v2.patch, PHOENIX-3271_v3.patch, PHOENIX-3271_v4.patch, 
> PHOENIX-3271_v5.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local 
> index is orders of magnitude faster that creation of global indexes (17 
> seconds versus 10-20 minutes - though more data is written in the global 
> index case). Under the covers, a global index is created through the running 
> of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a 
> table. In both of these cases, the data being upserted must all flow back to 
> the same client which can become a bottleneck for a large table. Instead, 
> what can be done is to push each separate, chunked UPSERT SELECT call out to 
> a different region server for execution there. One way we could implement 
> this would be to have an endpoint coprocessor push the chunked UPSERT SELECT 
> out to each region server and return the number of rows that were upserted 
> back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Reply via email to