[ 
https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14326252#comment-14326252
 ] 

Lars Hofhansl commented on PHOENIX-1609:
----------------------------------------

Yeah, most of the blocks are in place!

I would start with not having Phoenix trigger the M/R or Spark job. That would 
require additional (tricky?) setup, and one might not realize one needs that 
until an index is created. Of course in the long run it would be *far* more 
convenient if Phoenix did it all automatically.
Using M/R is only one way to to seed an index. Folks might want to write all 
kinds of jobs to seed an index (maybe even from external data).

Maybe we can later add a location of a script (or a jar as was suggested above) 
to the index creation command. Failure handling would be tricky, I suppose.

So it seems the only thing that is really missing is creating an index in an 
unfinished way, and let an external tools finish the job asynchronously.


> MR job to populate index tables 
> --------------------------------
>
>                 Key: PHOENIX-1609
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: maghamravikiran
>            Assignee: maghamravikiran
>         Attachments: 0001-PHOENIX_1609.patch
>
>
> Often, we need to create new indexes on master tables way after the data 
> exists on the master tables.  It would be good to have a simple MR job given 
> by the phoenix code that users can call to have indexes in sync with the 
> master table. 
> Users can invoke the MR job using the following command 
> hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt 
> INDEX_TABLE -columns a,b,c
> Is this ideal? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to