[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211698#comment-14211698
 ] 

Weichen Ye commented on HBASE-12394:


Welcome to review the latest diff.
https://reviews.apache.org/r/27519/diff/#



> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, 
> HBASE-12394.patch, HBase-12394 Document.pdf
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
> Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14211683#comment-14211683
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681437/HBASE-12394-v6.patch
  against trunk revision .
  ATTACHMENT ID: 12681437

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11668//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, 
> HBASE-12394.patch, HBase-12394 Document.pdf
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14210058#comment-14210058
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681344/HBASE-12394-v6.patch
  against trunk revision .
  ATTACHMENT ID: 12681344

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11664//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394-v6.patch, 
> HBASE-12394.patch, HBase-12394 Document.pdf
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14209708#comment-14209708
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12681309/HBASE-12394-v5.patch
  against trunk revision .
  ATTACHMENT ID: 12681309

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 6 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 1 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.regionserver.TestRegionReplicas

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.hadoop.hbase.regionserver.TestRegionReplicas.testVerifySecondaryAbilityToReadWithOnFiles(TestRegionReplicas.java:421)
at 
org.apache.hadoop.hbase.ResourceCheckerJUnitListener.testFinished(ResourceCheckerJUnitListener.java:183)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11661//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394-v5.patch, HBASE-12394.patch, HBase-12394 
> Document.pdf
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-11 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207636#comment-14207636
 ] 

Weichen Ye commented on HBASE-12394:


[~stack] Thank you for you review and comments!

1, To support multiple mapper for one region, That`s a very good idea for 
improvement. I have a very similar idea, which use size-based way to make 
splits for Table. For example, if we have a table with 100 regions, and in 
these regions there is one huge region. When we run a job and use this table as 
input, 99 mappers will quickly completed, and we have to wait for the last 
mapper for a long time. In the current version of HBase, we always use 
"manually split large region" way to deal with this data skew issue before we 
submit MR job. My idea is to add a config like "max_split_size" in 
TableInputFormat, so that large regions could be automatically cut into 
multiple splits in MR job. I`ll try to make another patch for this idea, what 
do you think about?

2, About the test / release note, sorry for not having them in the last patch. 
I`m working on it now. 

3, About  HBASE-2302, this feature is to exclude some specific regions from the 
MR job for some specific reason. I`m not sure in production environments this 
feature is rarely used or not. It is really hard to support both this feature 
and the new feature in this issue, because multiple regions in one mapper must 
be continuous, the mapper only deal with one Scan object. 

4, About the code in "If ...else..." way.  It is related to the HBASE-2302 
issue above. I hope when we use "one mapper one region" mode, HBASE-2302 
(exclude specific region from MR job) would be supported; when we use "one 
mapper multiple regions" mode, the feature in HBASE-2302 will not be support. 

But the duplicated code is always not good. I`ll try to abstract out some code 
so that the if branch and else branch can share.


 

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
> Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-11 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207407#comment-14207407
 ] 

stack commented on HBASE-12394:
---

Can I pass a hbase.mapreduce.scan.regionspermapper that is < 1?  E.g. .1 for 
ten mappers per region?  It does not look like it. That'd be sweet. Could do in 
another issue.

Need to add a release note on how to make this feature work listing  new config.

In comments, you say HBASE-2302. If you make a new patch, write out what that 
means.

Can you add a test to demo your new code actually works (move stuff to a static 
method if that helps make it testable?)

There is a bunch of code duplicated from the if branch in the else branch.  
Want to abstract out into a method they can share?

Thanks.



> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
> Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-09 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204367#comment-14204367
 ] 

Weichen Ye commented on HBASE-12394:


Welcome to the review board.
https://reviews.apache.org/r/27519/diff/3/


> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
> Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14204327#comment-14204327
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680522/HBASE-12394-v4.patch
  against trunk revision .
  ATTACHMENT ID: 12680522

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11624//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394-v4.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3.

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201741#comment-14201741
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12680097/HBASE-12394-v3.patch
  against trunk revision .
  ATTACHMENT ID: 12680097

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11612//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a 

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-06 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201662#comment-14201662
 ] 

Weichen Ye commented on HBASE-12394:


Hi ,all,
I update a new patch, which is smaller.  Welcome to the ReviewBoard: 
https://reviews.apache.org/r/27519/
The latest patch is the Diff Revision 2 (Latest).

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394-v3.patch, 
> HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/   
> The Latest Patch is "Diff Revision 2 (Latest)"
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, Text.class, 
> Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-05 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199648#comment-14199648
 ] 

Weichen Ye commented on HBASE-12394:


[~stack]  Certainly, that is exactly what i`m doing. A smaller patch is on the 
way. In the new patch I`ll only change the getSplits() Method in 
TableInputFormatBase.java, and give up the redundant code TableMultiRegion* .

Now I`m doing some test for the patch in a real cluster with CDH5.2.0

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFormatScan1.java
> *TestTableMultiRegionInputFormatScan2.java
> *TestTableMultiRegionInputFormatScanBase.java
> *TestTableMultiRegionMapReduceUtil.java
>  
> The files start with * are tests.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
> Text.class, Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199540#comment-14199540
 ] 

stack commented on HBASE-12394:
---

[~yeweichen] Did you get a chance to make the smaller patch?  Thanks.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFormatScan1.java
> *TestTableMultiRegionInputFormatScan2.java
> *TestTableMultiRegionInputFormatScanBase.java
> *TestTableMultiRegionMapReduceUtil.java
>  
> The files start with * are tests.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
> Text.class, Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194561#comment-14194561
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678921/HBASE-12394-v2.patch
  against trunk revision .
  ATTACHMENT ID: 12678921

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+int stopRegion = (i * regionPerMapperInt + regionPerMapperInt 
- 1 < keys.getFirst().length) ?
+(i * regionPerMapperInt + regionPerMapperInt - 1) : 
(keys.getFirst().length - 1);
+LOG.warn("Cannot resolve the host name for " + 
regionAddress + " because of " + e);
+TableMultiRegionMapReduceUtil.initTableMapperJob("Table", new Scan(), 
Import.Importer.class, Text.class,

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11565//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFo

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-03 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194552#comment-14194552
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678921/HBASE-12394-v2.patch
  against trunk revision .
  ATTACHMENT ID: 12678921

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+int stopRegion = (i * regionPerMapperInt + regionPerMapperInt 
- 1 < keys.getFirst().length) ?
+(i * regionPerMapperInt + regionPerMapperInt - 1) : 
(keys.getFirst().length - 1);
+LOG.warn("Cannot resolve the host name for " + 
regionAddress + " because of " + e);
+TableMultiRegionMapReduceUtil.initTableMapperJob("Table", new Scan(), 
Import.Importer.class, Text.class,

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11564//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394-v2.patch, HBASE-12394.patch
>
>
> Welcome to the ReviewBoard :https://reviews.apache.org/r/27519/
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFo

[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-03 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194489#comment-14194489
 ] 

Weichen Ye commented on HBASE-12394:


I update some lineLengths issues and upload the patch.v2
Welcome to the ReviewBoard
https://reviews.apache.org/r/27519/

This patch is large, because I add three new classes and there are too much 
redundant code between the following pairs:
TableMultiRegionInputFormat.java --  TableInputFormat.java
TableMultiRegionInputFormatBase.java --  TableInputFormatBase.java  
TableMultiRegionMapReduceUtil.java --  TableMapReduceUtil.java

I do this because in this patch I hope nothing changes in the original code 
TableInputFormat/TableInputFormatBase/TableMapReduceUtil.java. These 3 files 
are the core of HBase-Mapreduce module. I hope the first patch only mind its 
own business and do not affect others.

Now I am testing a new patch, which combine the redundant code together. In the 
new patch all changes are merged into 
TableInputFormatBase/TableInputFormat/TableMapReduceUtil.java, so the new patch 
will be smaller. If there are no compatibility problems, I will upload the 
smaller patch in few days.

So, for this patch , the only important part is the method getSplits(JobContext 
context) in TableInputFormatBase.java. 
The changes in other files are all about some references and class names, like 
(TableInputFormat.SCAN ->TableMultiRegionInputFormat.SCAN). 




> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394.patch
>
>
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFormatScan1.java
> *TestTableMultiRegionInputFormatScan2.java
> *TestTableMultiRegionInputFormatScanBase.java
> *TestTableMultiRegionMapReduceUtil.java
>  
> The files start with * are tests.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
> for one mapper. For example,if we have an HBase table with 300 regions, and 
> we set hbase.mapreduce.scan.regionspermapper = 3. Then we run a job to scan 
> the table, the job will use only 300/3=100 mappers.
> In this way, we can control the number of mappers using the following formula.
> Number of Mappers = (Total region numbers) / 
> hbase.mapreduce.scan.regionspermapper
> This is an example of the configuration.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
> Text.class, Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-02 Thread Weichen Ye (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194146#comment-14194146
 ] 

Weichen Ye commented on HBASE-12394:


Hi,Ted,

hbase.mapreduce.scan.regionspermapper controls how many regions used as input 
for one mapper. For example,if we have an HBase table with 300 regions, and we 
set hbase.mapreduce.scan.regionspermapper = 3.  Then we run a job to scan the 
table, the job will use only 300/3=100 mappers. 

In this way, we can control the number of mappers using the following formula.
Number of Mappers = (Total region numbers) /  
hbase.mapreduce.scan.regionspermapper



> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394.patch
>
>
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFormatScan1.java
> *TestTableMultiRegionInputFormatScan2.java
> *TestTableMultiRegionInputFormatScanBase.java
> *TestTableMultiRegionMapReduceUtil.java
>  
> The files start with * are tests.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> This is an example,which means each mapper has 3 regions as input.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
> Text.class, Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-11-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193306#comment-14193306
 ] 

Ted Yu commented on HBASE-12394:


Mind putting patch on reviewboard ?
hbase.mapreduce.scan.regionspermapper controls how many mappers would be used.
Have you considered specifying number of mappers for this feature ?

Thanks

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394.patch
>
>
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFormatScan1.java
> *TestTableMultiRegionInputFormatScan2.java
> *TestTableMultiRegionInputFormatScanBase.java
> *TestTableMultiRegionMapReduceUtil.java
>  
> The files start with * are tests.
> In order to support multiple regions for one mapper, we need a new property 
> in configuration--"hbase.mapreduce.scan.regionspermapper"
> This is an example,which means each mapper has 3 regions as input.
> 
>  hbase.mapreduce.scan.regionspermapper
>  3
> 
> This is an example for Java code:
> TableMultiRegionMapReduceUtil.initTableMapperJob(tablename, scan, Map.class, 
> Text.class, Text.class, job);
>  
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12394) Support multiple regions as input to each mapper in map/reduce jobs

2014-10-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14191741#comment-14191741
 ] 

Hadoop QA commented on HBASE-12394:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12678452/HBASE-12394.patch
  against trunk revision .
  ATTACHMENT ID: 12678452

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 8 new 
or modified tests.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+   * See {@link 
org.apache.hadoop.hbase.mapreduce.TableMultiRegionMapReduceUtil#convertScanToString(org.apache.hadoop.hbase.client.Scan)}
 for more details.
+   * Overrides previous calls to {@link 
org.apache.hadoop.hbase.client.Scan#addColumn(byte[], byte[])}for any families 
in the
+ String regionPerMapper 
=context.getConfiguration().get("hbase.mapreduce.scan.regionspermapper","1");
+ LOG.error("ERROR when parseInt: hbase.mapreduce.scan.regionspermapper 
must be an integer ");
+int 
stopRegion=(i*regionPerMapperInt+regionPerMapperInt-1https://builds.apache.org/job/PreCommit-HBASE-Build/11541//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/11541//console

This message is automatically generated.

> Support multiple regions as input to each mapper in map/reduce jobs
> ---
>
> Key: HBASE-12394
> URL: https://issues.apache.org/jira/browse/HBASE-12394
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Affects Versions: 2.0.0, 0.98.6.1
>Reporter: Weichen Ye
> Attachments: HBASE-12394.patch
>
>
> For Hadoop cluster, a job with large HBase table as input always consumes a 
> large amount of computing resources. For example, we need to create a job 
> with 1000 mappers to scan a table with 1000 regions. This patch is to support 
> one mapper using multiple regions as input.
>  
> The following new files are included in this patch:
> TableMultiRegionInputFormat.java
> TableMultiRegionInputFormatBase.java
> TableMultiRegionMapReduceUtil.java
> *TestTableMultiRegionInputFormatScan1.java
> *TestTableMultiRegionInputFormatScan2.java
> *TestTableMultiRegionInputFormatScanBase.java
> *TestTable