[ https://issues.apache.org/jira/browse/HBASE-5140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181604#comment-13181604 ]
Zhihong Yu commented on HBASE-5140: ----------------------------------- MAPREDUCE-1220, referenced in HBASE-4063, has been resolved against hadoop 0.23. So we cannot use it at the moment. @Josh: I believe the single region scenario is the degenerate case. Using max value for long should be fine for that case. The best practice is to presplit when creating the table. > TableInputFormat subclass to allow N number of splits per region during MR > jobs > ------------------------------------------------------------------------------- > > Key: HBASE-5140 > URL: https://issues.apache.org/jira/browse/HBASE-5140 > Project: HBase > Issue Type: New Feature > Components: mapreduce > Reporter: Josh Wymer > Priority: Trivial > Original Estimate: 72h > Remaining Estimate: 72h > > In regards to [HBASE-5138|https://issues.apache.org/jira/browse/HBASE-5138] I > am working on a subclass for the TableInputFormat class that overrides > getSplits in order to generate N number of splits per regions and/or N number > of splits per job. The idea is to convert the startKey and endKey for each > region from byte[] to BigDecimal, take the difference, divide by N, convert > back to byte[] and generate splits on the resulting values. Assuming your > keys are fully distributed this should generate splits at nearly the same > number of rows per split. Any suggestions on this issue are welcome. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira