[jira] [Issue Comment Edited] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

Ted Yu (JIRA) Tue, 21 Jun 2011 22:18:16 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053039#comment-13053039
 ]


Ted Yu edited comment on HBASE-3996 at 6/22/11 5:17 AM:
--------------------------------------------------------

FYI, patch has bunch of tabs in it instead of two spaces for tabs and some 
lines > 80 chars but no biggie -- I can fix that on commit.  Here's a few 
comments.

In TableSplit you create an HTable instance.  Do you need to?  And when you 
create it, though I believe it will be less of a problem going forward, can you 
use the constructor that takes a Configuration and table name?  Is there a 
close in Split interface?  If so, you might want to call close of your HTable 
in there.  (Where is it used?  Each split needs its own HTable?)  Use the 
constructor that takes a Configuration here too...
{noformat}
 +    HTable table = new HTable(tic.getTableName());$
{noformat}
You don't need the e.printStackTrace in below

{code}
+    Log.warn("Failed to convert Scan to Strting", e);$
+    e.printStackTrace();$
{code}

Nice javadoc.

By any chance is the code here in MultiTableInputFormatBase where we are 
checking start and end rows copied from elsewhere? 

Otherwise patch looks great.  Test too.


The line above it will output the stack trace (spelling too!).

You remove the hashCode in TableSplit.  Should it have one?



      was (Author: stack):
    FYI, patch has bunch of tabs in it instead of two spaces for tabs and some 
lines > 80 chars but no biggie -- I can fix that on commit.  Here's a few 
comments.

In TableSplit you create an HTable instance.  Do you need to?  And when you 
create it, though I believe it will be less of a problem going forward, can you 
use the constructor that takes a Configuration and table name?  Is there a 
close in Split interface?  If so, you might want to call close of your HTable 
in there.  (Where is it used?  Each split needs its own HTable?)  Use the 
constructor that takes a Configuration here too... +    HTable table = new 
HTable(tic.getTableName());$

You don't need the e.printStackTrace in below

{code}
+    Log.warn("Failed to convert Scan to Strting", e);$
+    e.printStackTrace();$
{code}

Nice javadoc.

By any chance is the code here in MultiTableInputFormatBase where we are 
checking start and end rows copied from elsewhere? 

Otherwise patch looks great.  Test too.


The line above it will output the stack trace (spelling too!).

You remove the hashCode in TableSplit.  Should it have one?


  
> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3996
>                 URL: https://issues.apache.org/jira/browse/HBASE-3996
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Eran Kutner
>             Fix For: 0.90.4
>
>         Attachments: MultiTableInputFormat.patch, 
> TestMultiTableInputFormat.java.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple 
> scanners on a single table can save a lot of time when running map/reduce 
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

Reply via email to