[jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Ankit Singhal (JIRA) Thu, 19 Jan 2017 23:25:06 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831336#comment-15831336
 ]


Ankit Singhal commented on PHOENIX-3271:
----------------------------------------

I did some stress testing if no. of handlers are adequate, I have not seen any 
problem. should we commit this now?

> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
>                 Key: PHOENIX-3271
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3271
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Ankit Singhal
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch, 
> PHOENIX-3271_v2.patch, PHOENIX-3271_v3.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local 
> index is orders of magnitude faster that creation of global indexes (17 
> seconds versus 10-20 minutes - though more data is written in the global 
> index case). Under the covers, a global index is created through the running 
> of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a 
> table. In both of these cases, the data being upserted must all flow back to 
> the same client which can become a bottleneck for a large table. Instead, 
> what can be done is to push each separate, chunked UPSERT SELECT call out to 
> a different region server for execution there. One way we could implement 
> this would be to have an endpoint coprocessor push the chunked UPSERT SELECT 
> out to each region server and return the number of rows that were upserted 
> back to the client.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-3271) Distribute UPSERT SELECT across cluster

Reply via email to