[
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13240765#comment-13240765
]
Ming Ma commented on HBASE-3996:
--------------------------------
Appreciate if anyone can clarify the type of applications that could benefit
from this.
1. Does this work try to help with hbase map reduce job performance? If so,
Eran, do you have any data for that? Couple months I tried scanning multiple
regions in one mapper task, that only helps if the mapper task takes less than
couple minutes and thus map reduce task scheduling becomes the overhead.
2. In the multitable scenario, if we assume different tables have different
schemes, does that mean the application mapper implementation need to take care
of input from different tables?
> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>
> Key: HBASE-3996
> URL: https://issues.apache.org/jira/browse/HBASE-3996
> Project: HBase
> Issue Type: Improvement
> Components: mapreduce
> Reporter: Eran Kutner
> Assignee: Eran Kutner
> Fix For: 0.96.0
>
> Attachments: 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt,
> 3996-v6.txt, 3996-v7.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple
> scanners on a single table can save a lot of time when running map/reduce
> jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira