[ https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325478#comment-14325478 ]
James Taylor commented on PHOENIX-1609: --------------------------------------- [~lhofhansl] - good idea about the ASYNC keyword. I was thinking we could use MR if the size of the data table is over a certain threshold, but making it explicit might be better. As far as the building blocks, we have most of them already. We have the ability in our existing MR integration to run a SELECT statement as a MR job (http://phoenix.apache.org/phoenix_mr.html). That's more than half the battle, as index population is done through an UPSERT SELECT query. We just need a way of piping the SELECT results into the index table. We also have a mechanism of directly creating HFiles through our CSV Bulk Loader (http://phoenix.apache.org/bulk_dataload.html) by generating UPSERT statements under-the-covers, and getting the underlying KeyValues to build the HFile. Perhaps some of this code can be leveraged/refactored. The one point we're not sure on is whether or not we should invoke the MR job from our Phoenix client when a CREATE INDEX ASYNC is done, or whether we require the user to initiate the MR job through the more standard hadoop.jar mechanism (outside of Phoenix). > MR job to populate index tables > -------------------------------- > > Key: PHOENIX-1609 > URL: https://issues.apache.org/jira/browse/PHOENIX-1609 > Project: Phoenix > Issue Type: New Feature > Reporter: maghamravikiran > Assignee: maghamravikiran > Attachments: 0001-PHOENIX_1609.patch > > > Often, we need to create new indexes on master tables way after the data > exists on the master tables. It would be good to have a simple MR job given > by the phoenix code that users can call to have indexes in sync with the > master table. > Users can invoke the MR job using the following command > hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt > INDEX_TABLE -columns a,b,c > Is this ideal? -- This message was sent by Atlassian JIRA (v6.3.4#6332)