[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13250393#comment-13250393
 ] 

Zhihong Yu commented on HBASE-5604:
-----------------------------------

{code}
+public class WALPlayer extends Configured implements Tool {
{code}
Javadoc for the above new class is desirable.
{code}
+    public void setup(Context context) {
+      table = 
Bytes.toBytes(context.getConfiguration().getStrings(TABLES_KEY)[0]);
+    }
{code}
Why index of 0 is always used above ?
{code}
+    public void setup(Context context) {
+      String[] tableMap = context.getConfiguration().getStrings(TABLE_MAP_KEY);
+      int i = 0;
+      for (String table : context.getConfiguration().getStrings(TABLES_KEY)) {
+        tables.put(Bytes.toBytes(table), Bytes.toBytes(tableMap[i++]));
{code}
I think validation on the lengths of the two String[] should be performed. If 
they don't match, bail out early.
{code}
+            // Aggregate as much as possible into a single Put/Delete
+            // operation before writing to the context.
{code}
Shall we utilize Put.heapSize() and remember the aggregate size of the Put so 
that we can write to context when certain threshold is reached ?
                
> M/R tools to replay WAL files
> -----------------------------
>
>                 Key: HBASE-5604
>                 URL: https://issues.apache.org/jira/browse/HBASE-5604
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, HLog-5604-v3.txt
>
>
> Just an idea I had. Might be useful for restore of a backup using the HLogs.
> This could an M/R (with a mapper per HLog file).
> The tool would get a timerange and a (set of) table(s). We'd pick the right 
> HLogs based on time before the M/R job is started and then have a mapper per 
> HLog file.
> The mapper would then go through the HLog, filter all WALEdits that didn't 
> fit into the time range or are not any of the tables and then uses 
> HFileOutputFormat to generate HFiles.
> Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to