[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211698#comment-14211698 ] Weichen Ye commented on HBASE-12394: Welcome to review the latest diff. https://reviews.apache.org/r/27519/diff/# > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, > HBASE-12394.patch, HBase-12394 Document.pdf > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, > Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211683#comment-14211683 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681437/HBASE-12394-v6.patch against trunk revision . ATTACHMENT ID: 12681437 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11668//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, > HBASE-12394.patch, HBase-12394 Document.pdf > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210058#comment-14210058 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681344/HBASE-12394-v6.patch against trunk revision . ATTACHMENT ID: 12681344 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11664//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, > HBASE-12394.patch, HBase-12394 Document.pdf > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209708#comment-14209708 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12681309/HBASE-12394-v5.patch against trunk revision . ATTACHMENT ID: 12681309 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicas {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421) at org.apache.hadoop.hbase.ResourceCheckerJUnitListener.testFinished(ResourceCheckerJUnitListener.java:183) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11661//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394.patch, HBase-12394 > Document.pdf > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207636#comment-14207636 ] Weichen Ye commented on HBASE-12394: [~stack] Thank you for you review and comments! 1, To support multiple mapper for one region, That`s a very good idea for improvement. I have a very similar idea, which use size-based way to make splits for Table. For example, if we have a table with 100 regions, and in these regions there is one huge region. When we run a job and use this table as input, 99 mappers will quickly completed, and we have to wait for the last mapper for a long time. In the current version of HBase, we always use "manually split large region" way to deal with this data skew issue before we submit MR job. My idea is to add a config like "max_split_size" in TableInputFormat, so that large regions could be automatically cut into multiple splits in MR job. I`ll try to make another patch for this idea, what do you think about? 2, About the test / release note, sorry for not having them in the last patch. I`m working on it now. 3, About HBASE-2302, this feature is to exclude some specific regions from the MR job for some specific reason. I`m not sure in production environments this feature is rarely used or not. It is really hard to support both this feature and the new feature in this issue, because multiple regions in one mapper must be continuous, the mapper only deal with one Scan object. 4, About the code in "If ...else..." way. It is related to the HBASE-2302 issue above. I hope when we use "one mapper one region" mode, HBASE-2302 (exclude specific region from MR job) would be supported; when we use "one mapper multiple regions" mode, the feature in HBASE-2302 will not be support. But the duplicated code is always not good. I`ll try to abstract out some code so that the if branch and else branch can share. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, > Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207407#comment-14207407 ] stack commented on HBASE-12394: --- Can I pass a hbase.mapreduce.scan.regionspermapper that is < 1? E.g. .1 for ten mappers per region? It does not look like it. That'd be sweet. Could do in another issue. Need to add a release note on how to make this feature work listing new config. In comments, you say HBASE-2302. If you make a new patch, write out what that means. Can you add a test to demo your new code actually works (move stuff to a static method if that helps make it testable?) There is a bunch of code duplicated from the if branch in the else branch. Want to abstract out into a method they can share? Thanks. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, > Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204367#comment-14204367 ] Weichen Ye commented on HBASE-12394: Welcome to the review board. https://reviews.apache.org/r/27519/diff/3/ > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, > Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204327#comment-14204327 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680522/HBASE-12394-v4.patch against trunk revision . ATTACHMENT ID: 12680522 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11624//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394-v4.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3.
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201741#comment-14201741 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12680097/HBASE-12394-v3.patch against trunk revision . ATTACHMENT ID: 12680097 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11612//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201662#comment-14201662 ] Weichen Ye commented on HBASE-12394: Hi ,all, I update a new patch, which is smaller. Welcome to the ReviewBoard: https://reviews.apache.org/r/27519/ The latest patch is the Diff Revision 2 (Latest). > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, > HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > The Latest Patch is "Diff Revision 2 (Latest)" > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, > Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199648#comment-14199648 ] Weichen Ye commented on HBASE-12394: [~stack] Certainly, that is exactly what i`m doing. A smaller patch is on the way. In the new patch I`ll only change the getSplits() Method in TableInputFormatBase.java, and give up the redundant code TableMultiRegion* . Now I`m doing some test for the patch in a real cluster with CDH5.2.0 > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTableMultiRegionMapReduceUtil.java > > The files start with * are tests. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, > Text.class, Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199540#comment-14199540 ] stack commented on HBASE-12394: --- [~yeweichen] Did you get a chance to make the smaller patch? Thanks. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTableMultiRegionMapReduceUtil.java > > The files start with * are tests. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, > Text.class, Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194561#comment-14194561 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678921/HBASE-12394-v2.patch against trunk revision . ATTACHMENT ID: 12678921 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +int stopRegion = (i * regionPerMapperInt + regionPerMapperInt - 1 < keys.getFirst().length) ? +(i * regionPerMapperInt + regionPerMapperInt - 1) : (keys.getFirst().length - 1); +LOG.warn("Cannot resolve the host name for " + regionAddress + " because of " + e); +TableMultiRegionMapReduceUtil.initTableMapperJob("Table", new Scan(), Import.Importer.class, Text.class, {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11565//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFo
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194552#comment-14194552 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678921/HBASE-12394-v2.patch against trunk revision . ATTACHMENT ID: 12678921 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +int stopRegion = (i * regionPerMapperInt + regionPerMapperInt - 1 < keys.getFirst().length) ? +(i * regionPerMapperInt + regionPerMapperInt - 1) : (keys.getFirst().length - 1); +LOG.warn("Cannot resolve the host name for " + regionAddress + " because of " + e); +TableMultiRegionMapReduceUtil.initTableMapperJob("Table", new Scan(), Import.Importer.class, Text.class, {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11564//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394-v2.patch, HBASE-12394.patch > > > Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/ > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFo
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194489#comment-14194489 ] Weichen Ye commented on HBASE-12394: I update some lineLengths issues and upload the patch.v2 Welcome to the ReviewBoard https://reviews.apache.org/r/27519/ This patch is large, because I add three new classes and there are too much redundant code between the following pairs: TableMultiRegionInputFormat.java -- TableInputFormat.java TableMultiRegionInputFormatBase.java -- TableInputFormatBase.java TableMultiRegionMapReduceUtil.java -- TableMapReduceUtil.java I do this because in this patch I hope nothing changes in the original code TableInputFormat/TableInputFormatBase/TableMapReduceUtil.java. These 3 files are the core of HBase-Mapreduce module. I hope the first patch only mind its own business and do not affect others. Now I am testing a new patch, which combine the redundant code together. In the new patch all changes are merged into TableInputFormatBase/TableInputFormat/TableMapReduceUtil.java, so the new patch will be smaller. If there are no compatibility problems, I will upload the smaller patch in few days. So, for this patch , the only important part is the method getSplits(JobContext context) in TableInputFormatBase.java. The changes in other files are all about some references and class names, like (TableInputFormat.SCAN ->TableMultiRegionInputFormat.SCAN). > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394.patch > > > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTableMultiRegionMapReduceUtil.java > > The files start with * are tests. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > hbase.mapreduce.scan.regionspermapper controls how many regions used as input > for one mapper. For example,if we have an HBase table with 300 regions, and > we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan > the table, the job will use only 300/3=100 mappers. > In this way, we can control the number of mappers using the following formula. > Number of Mappers = (Total region numbers) / > hbase.mapreduce.scan.regionspermapper > This is an example of the configuration. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, > Text.class, Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194146#comment-14194146 ] Weichen Ye commented on HBASE-12394: Hi,Ted, hbase.mapreduce.scan.regionspermapper controls how many regions used as input for one mapper. For example,if we have an HBase table with 300 regions, and we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan the table, the job will use only 300/3=100 mappers. In this way, we can control the number of mappers using the following formula. Number of Mappers = (Total region numbers) / hbase.mapreduce.scan.regionspermapper > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394.patch > > > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTableMultiRegionMapReduceUtil.java > > The files start with * are tests. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > This is an example,which means each mapper has 3 regions as input. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, > Text.class, Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193306#comment-14193306 ] Ted Yu commented on HBASE-12394: Mind putting patch on reviewboard ? hbase.mapreduce.scan.regionspermapper controls how many mappers would be used. Have you considered specifying number of mappers for this feature ? Thanks > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394.patch > > > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTableMultiRegionMapReduceUtil.java > > The files start with * are tests. > In order to support multiple regions for one mapper, we need a new property > in configuration--"hbase.mapreduce.scan.regionspermapper" > This is an example,which means each mapper has 3 regions as input. > > hbase.mapreduce.scan.regionspermapper > 3 > > This is an example for Java code: > TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, > Text.class, Text.class, job); > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs
[ https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191741#comment-14191741 ] Hadoop QA commented on HBASE-12394: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12678452/HBASE-12394.patch against trunk revision . ATTACHMENT ID: 12678452 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 8 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + * See {@link org.apache.hadoop.hbase.mapreduce.TableMultiRegionMapReduceUtil#convertScanToString(org.apache.hadoop.hbase.client.Scan)} for more details. + * Overrides previous calls to {@link org.apache.hadoop.hbase.client.Scan#addColumn(byte[], byte[])}for any families in the + String regionPerMapper =context.getConfiguration().get("hbase.mapreduce.scan.regionspermapper","1"); + LOG.error("ERROR when parseInt: hbase.mapreduce.scan.regionspermapper must be an integer "); +int stopRegion=(i*regionPerMapperInt+regionPerMapperInt-1https://builds.apache.org/job/PreCommit-HBASE-Build/11541//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11541//console This message is automatically generated. > Support multiple regions as input to each mapper in map/reduce jobs > --- > > Key: HBASE-12394 > URL: https://issues.apache.org/jira/browse/HBASE-12394 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0, 0.98.6.1 >Reporter: Weichen Ye > Attachments: HBASE-12394.patch > > > For Hadoop cluster, a job with large HBase table as input always consumes a > large amount of computing resources. For example, we need to create a job > with 1000 mappers to scan a table with 1000 regions. This patch is to support > one mapper using multiple regions as input. > > The following new files are included in this patch: > TableMultiRegionInputFormat.java > TableMultiRegionInputFormatBase.java > TableMultiRegionMapReduceUtil.java > *TestTableMultiRegionInputFormatScan1.java > *TestTableMultiRegionInputFormatScan2.java > *TestTableMultiRegionInputFormatScanBase.java > *TestTable