[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14209708#comment-14209708
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681309/HBASE-12394-v5.patch
  against trunk revision .
  ATTACHMENT ID: 12681309

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421)
at 
org.apache.hadoop.hbase.ResourceCheckerJUnitListener.testFinished(ResourceCheckerJUnitListener.java:183)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394.patch, HBase-12394 
 Document.pdf


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14210058#comment-14210058
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681344/HBASE-12394-v6.patch
  against trunk revision .
  ATTACHMENT ID: 12681344

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, 
 HBASE-12394.patch, HBase-12394 Document.pdf


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211683#comment-14211683
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681437/HBASE-12394-v6.patch
  against trunk revision .
  ATTACHMENT ID: 12681437

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, 
 HBASE-12394.patch, HBase-12394 Document.pdf


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14211698#comment-14211698
 ] 

Weichen Ye commented on HBASE-12394:


Welcome to review the latest diff.
https://reviews.apache.org/r/27519/diff/#



 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, 
 HBASE-12394.patch, HBase-12394 Document.pdf


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
 Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207407#comment-14207407
 ] 

stack commented on HBASE-12394:
---

Can I pass a hbase.mapreduce.scan.regionspermapper that is  1?  E.g. .1 for 
ten mappers per region?  It does not look like it. That'd be sweet. Could do in 
another issue.

Need to add a release note on how to make this feature work listing  new config.

In comments, you say HBASE-2302. If you make a new patch, write out what that 
means.

Can you add a test to demo your new code actually works (move stuff to a static 
method if that helps make it testable?)

There is a bunch of code duplicated from the if branch in the else branch.  
Want to abstract out into a method they can share?

Thanks.



 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
 Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-11 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14207636#comment-14207636
 ] 

Weichen Ye commented on HBASE-12394:


[~stack] Thank you for you review and comments!

1, To support multiple mapper for one region, That`s a very good idea for 
improvement. I have a very similar idea, which use size-based way to make 
splits for Table. For example, if we have a table with 100 regions, and in 
these regions there is one huge region. When we run a job and use this table as 
input, 99 mappers will quickly completed, and we have to wait for the last 
mapper for a long time. In the current version of HBase, we always use 
manually split large region way to deal with this data skew issue before we 
submit MR job. My idea is to add a config like max_split_size in 
TableInputFormat, so that large regions could be automatically cut into 
multiple splits in MR job. I`ll try to make another patch for this idea, what 
do you think about?

2, About the test / release note, sorry for not having them in the last patch. 
I`m working on it now. 

3, About  HBASE-2302, this feature is to exclude some specific regions from the 
MR job for some specific reason. I`m not sure in production environments this 
feature is rarely used or not. It is really hard to support both this feature 
and the new feature in this issue, because multiple regions in one mapper must 
be continuous, the mapper only deal with one Scan object. 

4, About the code in If ...else... way.  It is related to the HBASE-2302 
issue above. I hope when we use one mapper one region mode, HBASE-2302 
(exclude specific region from MR job) would be supported; when we use one 
mapper multiple regions mode, the feature in HBASE-2302 will not be support. 

But the duplicated code is always not good. I`ll try to abstract out some code 
so that the if branch and else branch can share.


 

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
 Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14204327#comment-14204327
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680522/HBASE-12394-v4.patch
  against trunk revision .
  ATTACHMENT ID: 12680522

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-09 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14204367#comment-14204367
 ] 

Weichen Ye commented on HBASE-12394:


Welcome to the review board.
https://reviews.apache.org/r/27519/diff/3/


 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394-v4.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
 Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-06 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201662#comment-14201662
 ] 

Weichen Ye commented on HBASE-12394:


Hi ,all,
I update a new patch, which is smaller.  Welcome to the ReviewBoard: 
https://reviews.apache.org/r/27519/
The latest patch is the Diff Revision 2 (Latest).

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
 Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14201741#comment-14201741
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680097/HBASE-12394-v3.patch
  against trunk revision .
  ATTACHMENT ID: 12680097

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
 HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
 The Latest Patch is Diff Revision 2 (Latest)
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199540#comment-14199540
 ] 

stack commented on HBASE-12394:
---

[~yeweichen] Did you get a chance to make the smaller patch?  Thanks.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 *TestTableMultiRegionInputFormatScan2.java
 *TestTableMultiRegionInputFormatScanBase.java
 *TestTableMultiRegionMapReduceUtil.java
  
 The files start with * are tests.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
 Text.class, Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-05 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199648#comment-14199648
 ] 

Weichen Ye commented on HBASE-12394:


[~stack]  Certainly, that is exactly what i`m doing. A smaller patch is on the 
way. In the new patch I`ll only change the getSplits() Method in 
TableInputFormatBase.java, and give up the redundant code TableMultiRegion* .

Now I`m doing some test for the patch in a real cluster with CDH5.2.0

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 *TestTableMultiRegionInputFormatScan2.java
 *TestTableMultiRegionInputFormatScanBase.java
 *TestTableMultiRegionMapReduceUtil.java
  
 The files start with * are tests.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
 Text.class, Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-03 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194489#comment-14194489
 ] 

Weichen Ye commented on HBASE-12394:


I update some lineLengths issues and upload the patch.v2
Welcome to the ReviewBoard
https://reviews.apache.org/r/27519/

This patch is large, because I add three new classes and there are too much 
redundant code between the following pairs:
TableMultiRegionInputFormat.java --  TableInputFormat.java
TableMultiRegionInputFormatBase.java --  TableInputFormatBase.java  
TableMultiRegionMapReduceUtil.java --  TableMapReduceUtil.java

I do this because in this patch I hope nothing changes in the original code 
TableInputFormat/TableInputFormatBase/TableMapReduceUtil.java. These 3 files 
are the core of HBase-Mapreduce module. I hope the first patch only mind its 
own business and do not affect others.

Now I am testing a new patch, which combine the redundant code together. In the 
new patch all changes are merged into 
TableInputFormatBase/TableInputFormat/TableMapReduceUtil.java, so the new patch 
will be smaller. If there are no compatibility problems, I will upload the 
smaller patch in few days.

So, for this patch , the only important part is the method getSplits(JobContext 
context) in TableInputFormatBase.java. 
The changes in other files are all about some references and class names, like 
(TableInputFormat.SCAN -TableMultiRegionInputFormat.SCAN). 




 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394.patch


 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 *TestTableMultiRegionInputFormatScan2.java
 *TestTableMultiRegionInputFormatScanBase.java
 *TestTableMultiRegionMapReduceUtil.java
  
 The files start with * are tests.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
 for one mapper. For example,if we have an HBase table with 300 regions, and 
 we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
 the table, the job will use only 300/3=100 mappers.
 In this way, we can control the number of mappers using the following formula.
 Number of Mappers = (Total region numbers) / 
 hbase.mapreduce.scan.regionspermapper
 This is an example of the configuration.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
 Text.class, Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194552#comment-14194552
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678921/HBASE-12394-v2.patch
  against trunk revision .
  ATTACHMENT ID: 12678921

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+int stopRegion = (i * regionPerMapperInt + regionPerMapperInt 
- 1  keys.getFirst().length) ?
+(i * regionPerMapperInt + regionPerMapperInt - 1) : 
(keys.getFirst().length - 1);
+LOG.warn(Cannot resolve the host name for  + 
regionAddress +  because of  + e);
+TableMultiRegionMapReduceUtil.initTableMapperJob(Table, new Scan(), 
Import.Importer.class, Text.class,

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194561#comment-14194561
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678921/HBASE-12394-v2.patch
  against trunk revision .
  ATTACHMENT ID: 12678921

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+int stopRegion = (i * regionPerMapperInt + regionPerMapperInt 
- 1  keys.getFirst().length) ?
+(i * regionPerMapperInt + regionPerMapperInt - 1) : 
(keys.getFirst().length - 1);
+LOG.warn(Cannot resolve the host name for  + 
regionAddress +  because of  + e);
+TableMultiRegionMapReduceUtil.initTableMapperJob(Table, new Scan(), 
Import.Importer.class, Text.class,

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394-v2.patch, HBASE-12394.patch


 Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-02 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14194146#comment-14194146
 ] 

Weichen Ye commented on HBASE-12394:


Hi,Ted,

hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
for one mapper. For example,if we have an HBase table with 300 regions, and we 
set hbase.mapreduce.scan.regionspermapper = 3.  Then we run a job to scan the 
table, the job will use only 300/3=100 mappers. 

In this way, we can control the number of mappers using the following formula.
Number of Mappers = (Total region numbers) /  
hbase.mapreduce.scan.regionspermapper



 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394.patch


 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 *TestTableMultiRegionInputFormatScan2.java
 *TestTableMultiRegionInputFormatScanBase.java
 *TestTableMultiRegionMapReduceUtil.java
  
 The files start with * are tests.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 This is an example,which means each mapper has 3 regions as input.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
 Text.class, Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193306#comment-14193306
 ] 

Ted Yu commented on HBASE-12394:


Mind putting patch on reviewboard ?
hbase.mapreduce.scan.regionspermapper controls how many mappers would be used.
Have you considered specifying number of mappers for this feature ?

Thanks

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: HBASE-12394
 URL: https://issues.apache.org/jira/browse/HBASE-12394
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 2.0.0, 0.98.6.1
Reporter: Weichen Ye
 Attachments: HBASE-12394.patch


 For Hadoop cluster, a job with large HBase table as input always consumes a 
 large amount of computing resources. For example, we need to create a job 
 with 1000 mappers to scan a table with 1000 regions. This patch is to support 
 one mapper using multiple regions as input.
  
 The following new files are included in this patch:
 TableMultiRegionInputFormat.java
 TableMultiRegionInputFormatBase.java
 TableMultiRegionMapReduceUtil.java
 *TestTableMultiRegionInputFormatScan1.java
 *TestTableMultiRegionInputFormatScan2.java
 *TestTableMultiRegionInputFormatScanBase.java
 *TestTableMultiRegionMapReduceUtil.java
  
 The files start with * are tests.
 In order to support multiple regions for one mapper, we need a new property 
 in configuration--hbase.mapreduce.scan.regionspermapper
 This is an example,which means each mapper has 3 regions as input.
 property
  namehbase.mapreduce.scan.regionspermapper/name
  value3/value
 /property
 This is an example for Java code:
 TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
 Text.class, Text.class, job);
  
   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14191741#comment-14191741
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678452/HBASE-12394.patch
  against trunk revision .
  ATTACHMENT ID: 12678452

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+   * See {@link 
org.apache.hadoop.hbase.mapreduce.TableMultiRegionMapReduceUtil#convertScanToString(org.apache.hadoop.hbase.client.Scan)}
 for more details.
+   * Overrides previous calls to {@link 
org.apache.hadoop.hbase.client.Scan#addColumn(byte[], byte[])}for any families 
in the
+ String regionPerMapper 
=context.getConfiguration().get(hbase.mapreduce.scan.regionspermapper,1);
+ LOG.error(ERROR when parseInt: hbase.mapreduce.scan.regionspermapper 
must be an integer );
+int 
stopRegion=(i*regionPerMapperInt+regionPerMapperInt-1keys.getFirst().length)?(i*regionPerMapperInt+regionPerMapperInt-1):(keys.getFirst().length-1);
+InetSocketAddress isa = new 
InetSocketAddress(location.getHostname(), location.getPort());
+   * This optimization is effective when there is a specific reasoning to 
exclude an entire region from the M-R job,
+   * Useful when we need to remember the last-processed top record and revisit 
the [last, current) interval for M-R processing,
+   * continuously. In addition to reducing InputSplits, reduces the load on 
the region server as well, due to the ordering of the keys.
+   * Override this method, if you want to bulk exclude regions altogether from 
M-R. By default, no region is excluded( i.e. all regions are included).

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//console

This message is automatically generated.

 Support multiple regions as input to each mapper in map/reduce jobs
 ---

 Key: