[
https://issues.apache.org/jira/browse/PHOENIX-3271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15842317#comment-15842317
]
James Taylor edited comment on PHOENIX-3271 at 1/27/17 6:33 AM:
----------------------------------------------------------------
I like your long term ideas, [~enis] (JIRA please?), but I think this patch is
good in the near term. The timeouts should be prevented by our RenewLease
client side impl (by [~samarthjain]). If HBase let us renew leases on the
server side, that'd be an improvement, but what we have works.
IMHO, having a safeguard config would lead to code duplication and make
maintenance harder. I think we're ok without it (provided we do adequate
testing). This patch should improve global index build times substantially.
was (Author: jamestaylor):
I like your long term ideas, [~enis] (JIRA please?), but I think this oatch is
good in the near term. The timeouts should be prevented by our RenewLease
client side impl (by [~samarthjain]). If HBase let us renew leases on the
server side, that'd be an improvement, but what we have works.
IMHO, having a safeguard config would lead to code duplication and make
maintenance harder. I think we're ok without it (provided we do adequate
testing). This patch should improve global index build times substantially.
> Distribute UPSERT SELECT across cluster
> ---------------------------------------
>
> Key: PHOENIX-3271
> URL: https://issues.apache.org/jira/browse/PHOENIX-3271
> Project: Phoenix
> Issue Type: Improvement
> Reporter: James Taylor
> Assignee: Ankit Singhal
> Fix For: 4.10.0
>
> Attachments: PHOENIX-3271.patch, PHOENIX-3271_v1.patch,
> PHOENIX-3271_v2.patch, PHOENIX-3271_v3.patch, PHOENIX-3271_v4.patch,
> PHOENIX-3271_v5.patch
>
>
> Based on some informal testing we've done, it seems that creation of a local
> index is orders of magnitude faster that creation of global indexes (17
> seconds versus 10-20 minutes - though more data is written in the global
> index case). Under the covers, a global index is created through the running
> of an UPSERT SELECT. Also, UPSERT SELECT provides an easy way of copying a
> table. In both of these cases, the data being upserted must all flow back to
> the same client which can become a bottleneck for a large table. Instead,
> what can be done is to push each separate, chunked UPSERT SELECT call out to
> a different region server for execution there. One way we could implement
> this would be to have an endpoint coprocessor push the chunked UPSERT SELECT
> out to each region server and return the number of rows that were upserted
> back to the client.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)