[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

Gabriel Reid (JIRA) Mon, 02 Mar 2015 08:39:55 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343350#comment-14343350
 ]


Gabriel Reid commented on PHOENIX-1609:
---------------------------------------

Patch looks pretty good to me, just a few pretty minor things I noticed:
* It looks like CsvToKeyValueMapper#loadPreUpsertProcessor and 
PhoenixConfigurationUtil#loadPreUpsertProcessor are copies of each other, so 
that can be reduced to a single implementation
* I noticed that the string separator in the ColumnInfo class is changed -- 
just curious, why is that?
* There appear to be two nearly identical copies of 
QueryUtil#constructUpsertStatement, although one takes a hint parameter. I 
think the non-hint version could just delegate to the version with a hint, and 
that way we can reduce code duplication
* The number of reducers is unnecessarily set to 0 in IndexTool -- this can be 
removed. It'll be overwritten by the HBase setup of the job anyhow, but having 
that call there to explicitly set the number of reducers to 0 gives the 
impression that it's supposed to be doing something
* There are some long option names in IndexTool that contain spaces (e.g. Data 
table, Index Table). These parameters are meant to be supplied using the 
--long-parameter-name notation, so I'm not sure what will happen when they 
contain spaces, but I don't think it'll be good. These should probably be 
data-table, index-table, etc.


> MR job to populate index tables 
> --------------------------------
>
>                 Key: PHOENIX-1609
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1609
>             Project: Phoenix
>          Issue Type: New Feature
>            Reporter: maghamravikiran
>            Assignee: maghamravikiran
>         Attachments: 0001-PHOENIX-1609-4.0.patch, 
> 0001-PHOENIX-1609-wip.patch, 0001-PHOENIX_1609.patch
>
>
> Often, we need to create new indexes on master tables way after the data 
> exists on the master tables.  It would be good to have a simple MR job given 
> by the phoenix code that users can call to have indexes in sync with the 
> master table. 
> Users can invoke the MR job using the following command 
> hadoop jar org.apache.phoenix.mapreduce.Index -st MASTER_TABLE -tt 
> INDEX_TABLE -columns a,b,c
> Is this ideal? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1609) MR job to populate index tables

Reply via email to