[ 
https://issues.apache.org/jira/browse/PHOENIX-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14247645#comment-14247645
 ] 

James Taylor edited comment on PHOENIX-1520 at 12/16/14 2:58 AM:
-----------------------------------------------------------------

One possibility is to use map-reduce to populate the index if the size of the 
table is bigger than a threshold (PHOENIX-413). We'd get restartability and 
progress tracking in that case (at the expense of it being slower).

Might be able to leverage the new map-reduce over Phoenix tables 
(PHOENIX-1454). [~maghamravi] - is it possible to run an UPSERT SELECT command 
through the new map-reduce functionality to do the initial secondary index 
population?


was (Author: jamestaylor):
One possibility is to use map-reduce to populate the index if the size of the 
table is bigger than a threshold (PHOENIX-413). We'd get restartability and 
progress tracking in that case (at the expense of it being slower).

> Provide a means of tracking progress of secondary index population
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-1520
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1520
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: Dave Hacker
>
> When an index is created against a table that already has a substantial 
> amount of data, the initial population of the index can take a long time. We 
> should provide a means of monitoring the percentage complete of the task.
> It's possible that this could be done in a way that is general enough to 
> apply to any Phoenix query. The secondary index population is done through an 
> UPSERT SELECT statement that selects from the data table and upserts into the 
> index table. We have table stats up front that tell us how many guidepost 
> chunks will be iterated over. We could monitor the thread pool based on the 
> tasks queued in the pool by ParallelIterators to get an idea of total number 
> of remaining tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to