[ https://issues.apache.org/jira/browse/PHOENIX-2154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14705882#comment-14705882 ]
Lars Hofhansl commented on PHOENIX-2154: ---------------------------------------- Sorry... Bit late. The specific problem for index builds is that (a) it goes into an initially empty table (b) we can't know ahead of time how to presplit the table (unless we do a first pass or sample). With that, the index build will go through a single reducer producing a single - possibly humongous region, because we'll have a table with a single region, which HBase then needs to split potentially multiple times. I agree using the HBase front door is not ideal either, but at least we're writing in memstore-size chunks and HBase will split as we go. > Failure of one mapper should not affect other mappers in MR index build > ----------------------------------------------------------------------- > > Key: PHOENIX-2154 > URL: https://issues.apache.org/jira/browse/PHOENIX-2154 > Project: Phoenix > Issue Type: Bug > Reporter: James Taylor > Assignee: maghamravikiran > Attachments: IndexTool.java, PHOENIX-2154-WIP.patch > > > Once a mapper in the MR index job succeeds, it should not need to be re-done > in the event of the failure of one of the other mappers. The initial > population of an index is based on a snapshot in time, so new rows getting > *after* the index build has started and/or failed do not impact it. > Also, there's a 1:1 correspondence between index rows and table rows, so > there's really no need to dedup. However, the index rows will have a > different row key than the data table, so I'm not sure how the HFiles are > split. Will they potentially overlap and is this an issue? -- This message was sent by Atlassian JIRA (v6.3.4#6332)