[ https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924616#comment-15924616 ]
Lars Hofhansl commented on PHOENIX-3271: ---------------------------------------- Any comment on this? [~an...@apache.org], [~jamestaylor], [~chrajeshbab...@gmail.com] > Distribute UPSERT SELECT across cluster > --------------------------------------- > > Key: PHOENIX-3271 > URL: https://issues.apache.org/jira/browse/PHOENIX-3271 > Project: Phoenix > Issue Type: Improvement > Reporter: James Taylor > Assignee: Ankit Singhal > Fix For: 4.10.0 > > Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch, > PHOENIX-3271_v2.patch, PHOENIX-3271_v3.patch, PHOENIX-3271_v4.patch, > PHOENIX-3271_v5.patch, PHOENIX-3271_v5_rebased.patch > > > Based on some informal testing we've done, it seems that creation of a local > index is orders of magnitude faster that creation of global indexes (17 > seconds versus 10-20 minutes - though more data is written in the global > index case). Under the covers, a global index is created through the running > of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a > table. In both of these cases, the data being upserted must all flow back to > the same client which can become a bottleneck for a large table. Instead, > what can be done is to push each separate, chunked UPSERT SELECT call out to > a different region server for execution there. One way we could implement > this would be to have an endpoint coprocessor push the chunked UPSERT SELECT > out to each region server and return the number of rows that were upserted > back to the client. -- This message was sent by Atlassian JIRA (v6.3.15#6346)