[ https://issues.apache.org/jira/browse/HBASE-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193027#comment-13193027 ]
Alexey Romanenko commented on HBASE-2965: ----------------------------------------- It seems this is not implemented yet, isn't it? > Implement MultipleTableInputs which is analogous to MultipleInputs in Hadoop > ---------------------------------------------------------------------------- > > Key: HBASE-2965 > URL: https://issues.apache.org/jira/browse/HBASE-2965 > Project: HBase > Issue Type: New Feature > Components: mapred, mapreduce > Reporter: Adam Warrington > Assignee: Ophir Cohen > Priority: Minor > > This feature would be helpful for doing reduce side joins, or even passing > similarly structured data from multiple tables through map reduce. The API I > envision would be very similar to the already existent MultipleInputs, parts > of which could be reused. > MultipleTableInputs would have a public api like: > class MultipleTableInputs { > public static void addInputTable(Job job, Table table, Scan scan, Class<? > extends TableInputFormatBase> inputFormatClass, Class<? extends Mapper> > mapperClass); > }; > MultipleTableInputs would build a mapping of Tables to configured > TableInputFormats the same way MultipleInputs builds a mapping between Paths > and InputFormats. Since most people will probably use TableInputFormat.class > as the input format class, the MultipleTableInput implementation will have to > replace the TableInputFormatBase's private scan and table members that are > configured when an instance of TableInputFormat is created (from within its > setConf() method) by calling setScan and setHTable with the table and scan > that are passed into addInputTable above. MultipleTableInputFormat's > addInputTable() member function would also set the input format for the job > to DelegatingTableInputFormat, described below. > A new class called DelegatingTableInputFormat would be analogous to > DelegatingInputFormat, where getSplits() would return TaggedInputSplits (same > TaggedInputSplit object that the Hadoop DelegatingInputFormat uses), which > tag the split with its InputFormat and Mapper. These are created by looping > through the HTable to InputFormat mappings, and calling getSplits on each > input format, and using the split, the input format, and mapper as > constructor args to TaggedInputSplits. > The createRecordReader() function in DelegatingTableInputFormat could have > the same implementation as the Hadoop DelegatingInputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira