[ https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237744#comment-13237744 ]
Lars Hofhansl commented on HBASE-5604: -------------------------------------- I went the distributed log splitting route first a while back. Had it all working in fact. Or so I thought. Then I tried playing logs to a new table and realized that distributed log splitting only works for crash recovery (before the regions could split further) because it splits using the region name in the log. That makes sense, because otherwise each region server participating in log splitting would need to look up the current region for each encountered row otherwise (the region could have split and the row in question needs to go one of the daugthers). That is essentially what the highlevel API does anyway. Yes, definitely need a reducer. Similar to what I did for Import, I can see this working in two modes: # The mappers directly apply changes to a running HBase cluster (TableOutputformat). No reducers needed in this case. # Create HFiles via HFileOutputFormat in the reduce phase. In fact this tool would probably be very much like Import, "just" with a different InputFormat. > HLog replay tool that generates HFiles for use by LoadIncrementalHFiles. > ------------------------------------------------------------------------ > > Key: HBASE-5604 > URL: https://issues.apache.org/jira/browse/HBASE-5604 > Project: HBase > Issue Type: New Feature > Reporter: Lars Hofhansl > > Just an idea I had. Might be useful for restore of a backup using the HLogs. > This could an M/R (with a mapper per HLog file). > The tool would get a timerange and a (set of) table(s). We'd pick the right > HLogs based on time before the M/R job is started and then have a mapper per > HLog file. > The mapper would then go through the HLog, filter all WALEdits that didn't > fit into the time range or are not any of the tables and then uses > HFileOutputFormat to generate HFiles. > Would need to indicate the splits we want, probably from a live table. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira