[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

Rajeshbabu Chintaguntla (JIRA) Wed, 26 Aug 2015 17:15:06 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715791#comment-14715791
 ]


Rajeshbabu Chintaguntla commented on PHOENIX-2154:
--------------------------------------------------

bq. are we taking advantage of knowing the split points? No reduce phase should 
be necessary, so we can take the same approach as we're looking at now for 
front-door-HBase-APIs MR build:
I have not fully checked IndexTool code. Will check and get back to you.

bq.  it seems like there'd be corner cases in which the data table may split 
while the index is being built - it's unclear to me how this scenario would be 
handled.
Basically in normal cases when there is region split during mapreduce phase the 
hfile data will be seperated into the child regions before loading. It's done 
in HBase using half store file reader but the reader is not configurable in 
LoadIncrementalHFiles. Using the same default implementation of 
HalfStoreFileReader may spoil local index data like all the data go to first 
child only. For this we need code change in HBase. One thing we can do is 
before loading local index data we can check the split keys are same or not and 
fail if there are any change in split keys and guide the user to re run the 
job. 

bq. it seems there are problems with IndexTool for local indexes, as there are 
scenarios where the MR completes yet a scan over the Phoenix tables says there 
are 0 rows.
Need to check this.

> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2154
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>            Assignee: maghamravikiran
>         Attachments: IndexTool.java, PHOENIX-2154-WIP.patch, 
> PHOENIX-2154-_HBase_Frontdoor_API_WIP.patch
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

Reply via email to