[ 
https://issues.apache.org/jira/browse/HBASE-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193027#comment-13193027
 ] 

Alexey Romanenko commented on HBASE-2965:
-----------------------------------------

It seems this is not implemented yet, isn't it?
                
> Implement MultipleTableInputs which is analogous to MultipleInputs in Hadoop
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2965
>                 URL: https://issues.apache.org/jira/browse/HBASE-2965
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapred, mapreduce
>            Reporter: Adam Warrington
>            Assignee: Ophir Cohen
>            Priority: Minor
>
> This feature would be helpful for doing reduce side joins, or even passing 
> similarly structured data from multiple tables through map reduce. The API I 
> envision would be very similar to the already existent MultipleInputs, parts 
> of which could be reused.
> MultipleTableInputs would have a public api like:
> class MultipleTableInputs {
>   public static void addInputTable(Job job, Table table, Scan scan, Class<? 
> extends TableInputFormatBase> inputFormatClass, Class<? extends Mapper> 
> mapperClass);
> };
> MultipleTableInputs would build a mapping of Tables to configured 
> TableInputFormats the same way MultipleInputs builds a mapping between Paths 
> and InputFormats. Since most people will probably use TableInputFormat.class 
> as the input format class, the MultipleTableInput implementation will have to 
> replace the TableInputFormatBase's private scan and table members that are 
> configured when an instance of TableInputFormat is created (from within its 
> setConf() method) by calling setScan and setHTable with the table and scan 
> that are passed into addInputTable above. MultipleTableInputFormat's 
> addInputTable() member function would also set the input format for the job 
> to DelegatingTableInputFormat, described below.
> A new class called DelegatingTableInputFormat would be analogous to 
> DelegatingInputFormat, where getSplits() would return TaggedInputSplits (same 
> TaggedInputSplit object that the Hadoop DelegatingInputFormat uses), which 
> tag the split with its InputFormat and Mapper. These are created by looping 
> through the HTable to InputFormat mappings, and calling getSplits on each 
> input format, and using the split, the input format, and mapper as 
> constructor args to TaggedInputSplits.
> The createRecordReader() function in DelegatingTableInputFormat could have 
> the same implementation as the Hadoop DelegatingInputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to