[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

James Taylor (JIRA) Mon, 17 Aug 2015 22:22:31 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14700752#comment-14700752
 ]


James Taylor commented on PHOENIX-2154:
---------------------------------------

Perf results for a 8 node cluster

|  | 100M narrow table (min) | 1B narrow table (min) | 1B wide table (min)
| Non MR | 10 | 76 | 511
| HFile MR | 17 | 161 | 1,375

We'll add the HBase API MR numbers added here soon.

As noted by Thomas, the 1B wide table fails to load the HFiles (perhaps because 
hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily needs to be increased) 
, so we're not accounting for that time in the above.


> Failure of one mapper should not affect other mappers in MR index build
> -----------------------------------------------------------------------
>
>                 Key: PHOENIX-2154
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2154
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: IndexTool.java
>
>
> Once a mapper in the MR index job succeeds, it should not need to be re-done 
> in the event of the failure of one of the other mappers. The initial 
> population of an index is based on a snapshot in time, so new rows getting 
> *after* the index build has started and/or failed do not impact it.
> Also, there's a 1:1 correspondence between index rows and table rows, so 
> there's really no need to dedup. However, the index rows will have a 
> different row key than the data table, so I'm not sure how the HFiles are 
> split. Will they potentially overlap and is this an issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-2154) Failure of one mapper should not affect other mappers in MR index build

Reply via email to