[ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679480#comment-14679480
 ] 

ravi commented on PHOENIX-2154:
-------------------------------

Right now, the job does both the tasks for generating the HFiles and then 
loading them onto the target table.   Should we try to break it into two 
process where 
1) The job runs the HFiles generation code. We run it with a submit() rather 
than waitForCompletion(). This way, the client returns immediately 
2) Once the job finishes, the client runs the 
org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles job to load the HFiles 
onto the table. 

Currently, the mapper output runs through a KeyValueSortReducer , a Reducer 
class that is responsible to write the output in HFile format.  To keep the 
state across jobs(when failures happen), we will have to write the map output 
to HDFS and then run a sub sequent job that loads the previous map output and 
write to HFiles through the KeyValueSortReducer.  Not sure if we wanted to 
travel this path.



 


> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2154
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to