[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14590459#comment-14590459
 ] 

Hudson commented on HBASE-13356:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #984 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/984/])
HBASE-13356 HBase should provide an InputFormat supporting multiple scans in 
mapreduce jobs over snapshots (Andrew Mains) (apurtell: rev 
1013b61bb0c5710bcc34b53ad8b6c6489def3305)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapred/TestMultiTableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatTestBase.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormatImpl.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormatImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 0.98.14, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-17 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14591135#comment-14591135
 ] 

Hudson commented on HBASE-13356:


SUCCESS: Integrated in HBase-0.98 #1031 (See 
[https://builds.apache.org/job/HBase-0.98/1031/])
HBASE-13356 HBase should provide an InputFormat supporting multiple scans in 
mapreduce jobs over snapshots (Andrew Mains) (apurtell: rev 
1013b61bb0c5710bcc34b53ad8b6c6489def3305)
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormatImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapred/TestMultiTableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormatImpl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatTestBase.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 0.98.14, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-26 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14603965#comment-14603965
 ] 

Hudson commented on HBASE-13356:


FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #995 (See 
[https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/995/])
Amend HBASE-13356 HBase should provide an InputFormat supporting multiple scans 
in mapreduce jobs over snapshots (Andrew Mains) (apurtell: rev 
cfb4827326b6743cb732b92580152bcf46647b2c)
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 0.98.14, 1.2.0
>
> Attachments: HBASE-13356-0.98-addendum-hadoop-1.patch, 
> HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-27 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14604044#comment-14604044
 ] 

Hudson commented on HBASE-13356:


FAILURE: Integrated in HBase-0.98 #1042 (See 
[https://builds.apache.org/job/HBase-0.98/1042/])
Amend HBASE-13356 HBase should provide an InputFormat supporting multiple scans 
in mapreduce jobs over snapshots (Andrew Mains) (apurtell: rev 
cfb4827326b6743cb732b92580152bcf46647b2c)
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 0.98.14, 1.2.0
>
> Attachments: HBASE-13356-0.98-addendum-hadoop-1.patch, 
> HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-03-29 Thread Andrew Mains (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386231#comment-14386231
 ] 

Andrew Mains commented on HBASE-13356:
--

Spent some time speccing out a potential implementation for this today:

Interface: 

Jobs wanting to run multiple scans over snapshots can use 
MultiTableSnapshotInputFormat. This can be configured using TableMapreduceUtil, 
as usual, with the signature:
{code}
  /**
   *  Sets up the job for reading from one or more multiple table snapshots, 
with one or more scan per snapshot.
   *  It bypasses hbase servers and read directly from snapshot files.
   *
   * @param snapshotScans map of snapshot name to a list of scans on that 
snapshot.
   * @param mapper  The mapper class to use.
   * @param outputKeyClass  The class of the output key.
   * @param outputValueClass  The class of the output value.
   * @param job  The current job to adjust.  Make sure the passed job is
   * carrying all necessary HBase configuration.
   * @param addDependencyJars upload HBase jars and jars for any of the 
configured
   *   job classes via the distributed cache (tmpjars).
   */
  public static void initMultiTableSnapshotMapperJob(Map> snapshotScans,
 Class mapper,
 Class outputKeyClass,
 Class outputValueClass, 
Job job,
 boolean addDependencyJars, 
Path tmpRestoreDir
  ) throws IOException {
{code}

Implementation:

Most of the work can be done through delegation to 
TableSnapshotInputFormatImpl. The primary change would be to make 
TableSnapshotInputFormatImpl.InputSplit take in a scan object and restoreDir 
path, instead of retrieving these from the job configuration. This would allow 
MultiTableSnapshotInputFormat to avoid setting an individual scan and restore 
directory on the configuration (they can be passed along by way of the split, 
similar to TableSplit).

Tests:

Any implementation should probably pass at least the tests for 
MultiTableInputFormat, and possibly some of the tests for 
TableSnapshotInputFormat as well.

Thoughts?

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Priority: Minor
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-03-30 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387504#comment-14387504
 ] 

Enis Soztutar commented on HBASE-13356:
---

Sounds good. 
bq.  TableSnapshotInputFormatImpl.InputSplit take in a scan object and 
restoreDir path
We can create sub dirs in the restoreDir from user for each scan. 


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Priority: Minor
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-03-30 Thread Andrew Mains (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387638#comment-14387638
 ] 

Andrew Mains commented on HBASE-13356:
--

> We can create sub dirs in the restoreDir from user for each scan.

Good call. I've got a basic implementation of this; will add some more tests 
and post a patch soon. 

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Priority: Minor
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512859#comment-14512859
 ] 

Hadoop QA commented on HBASE-13356:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12728190/HBASE-13356.patch
  against master branch at commit cd83d39fb4f50db901b699ba5470b5f709c95c69.
  ATTACHMENT ID: 12728190

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 12 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 4 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1965 checkstyle errors (more than the master's current 1900 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 7 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ * MultiTableSnapshotInputFormat generalizes {@link 
org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat}
+ * allowing a MapReduce job to run over one or more table snapshots, with one 
or more scans configured for each.
+ * Internally, the input format delegates to {@link 
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}
+ * and thus has the same performance advantages; see {@link 
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat} for
+ * Usage is similar to TableSnapshotInputFormat, with the following exception: 
initMultiTableSnapshotMapperJob takes in a map
+ * from snapshot name to a collection of scans. For each snapshot in the map, 
each corresponding scan will be applied;
+ * the overall dataset for the job is defined by the concatenation of the 
regions and tables included in each snapshot/scan
+ * {@link 
org.apache.hadoop.hbase.mapred.TableMapReduceUtil#initMultiTableSnapshotMapperJob(Map,
 Class, Class, Class, JobConf, boolean, Path)}
+ * Internally, this input format restores each snapshot into a subdirectory of 
the given tmp directory. Input splits and
+ * record readers are created as described in {@link 
org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13816//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13816//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-25 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512866#comment-14512866
 ] 

Ted Yu commented on HBASE-13356:


MultiTableSnapshotInputFormat.java and MultiTableSnapshotInputFormatImpl.java 
need Apache license. Add annotation for audience.

There're several long lines - please limit line width to 100 characters.
{code}
125*  Sets up the job for reading from one or more multiple table 
snapshots, with one or more scan per snapshot.
{code}
Should 'one or more multiple table snapshots' be 'one or more table snapshots' ?
nit: 'one or more scan' -> 'one or more scans'
{code}
26  public class MultiTableSnapshotInputFormatImpl {
27  
28private static final Log LOG = 
LogFactory.getLog(MultiTableSnapshotInputFormat.class);
{code}
Classname for LOG doesn't match the real classname.
{code}
85  for (TableSnapshotInputFormatImpl.InputSplit split : splits) {
86rtn.add(split);
87  }
{code}
Can you use 
https://docs.oracle.com/javase/7/docs/api/java/util/List.html#addAll(java.util.Collection)
 ?
{code}
177   private Map 
generateSnapshotToRestoreDir(Collection snapshots, Path baseRestoreDir) 
{
{code}
Name the method generateSnapshotToRestoreDirMapping().


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-26 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513444#comment-14513444
 ] 

Ted Yu commented on HBASE-13356:


{code}
50  @InterfaceAudience.Public
51  @InterfaceStability.Evolving
52  public class MultiTableSnapshotInputFormatImpl {
{code}
Does MultiTableSnapshotInputFormatImpl have to be public ? It can be 
LimitedPrivate, right ?
{code}
34  public final class ConfigurationUtil {
{code}
ConfigurationUtil should be annotated with @InterfaceAudience.Public

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-26 Thread Andrew Mains (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513456#comment-14513456
 ] 

Andrew Mains commented on HBASE-13356:
--

> Does MultiTableSnapshotInputFormatImpl have to be public ? It can be 
> LimitedPrivate, right ?

Probably so, yeah (I was going off of TableSnapshotInputFormatImpl, which is 
Public, but LimitedPrivate makes more sense for an implementation class).

> ConfigurationUtil should be annotated with @InterfaceAudience.Public

Done.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.3.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513474#comment-14513474
 ] 

Hadoop QA commented on HBASE-13356:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12728280/HBASE-13356.2.patch
  against master branch at commit 75507af9f80716a2dac2dd3d1642d70373976915.
  ATTACHMENT ID: 12728280

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13822//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13822//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13822//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13822//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.3.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-26 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513548#comment-14513548
 ] 

Hadoop QA commented on HBASE-13356:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12728296/HBASE-13356.3.patch
  against master branch at commit 75507af9f80716a2dac2dd3d1642d70373976915.
  ATTACHMENT ID: 12728296

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.util.TestHBaseFsck

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13825//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13825//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13825//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/13825//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.3.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-04-27 Thread Andrew Mains (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515127#comment-14515127
 ] 

Andrew Mains commented on HBASE-13356:
--

Hmm looks like that same test is failing on master at the moment; I'll try to 
rerun once that's fixed.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.3.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-22 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14556044#comment-14556044
 ] 

Ted Yu commented on HBASE-13356:


When I apply patch v3 on trunk, I got several rejected hunks in 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java

Mind updating patch on latest trunk ?

Thanks

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.3.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-31 Thread Andrew Mains (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566827#comment-14566827
 ] 

Andrew Mains commented on HBASE-13356:
--

Just updated, and confirmed that the v4 patch applies using smart-apply-patch. 
Let me know if there are any other issues.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356.2.patch, HBASE-13356.3.patch, 
> HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566856#comment-14566856
 ] 

Ted Yu commented on HBASE-13356:


Do you mind attaching patch for branch-1 ?

There are some conflicts in 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java

The patch name for branch-1 should contain branch-1. e.g. 13356-branch-1.patch

Thanks

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566877#comment-14566877
 ] 

Hadoop QA commented on HBASE-13356:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12736474/HBASE-13356.4.patch
  against master branch at commit 0e6102a68cc95f0240fa72a5f86866c07b8744b7.
  ATTACHMENT ID: 12736474

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 14 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.backup.TestHFileArchiving

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14250//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14250//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14250//artifact/patchprocess/checkstyle-aggregate.html

  Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14250//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566880#comment-14566880
 ] 

Ted Yu commented on HBASE-13356:


TestHFileArchiving test failure is not related to the patch.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-31 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566909#comment-14566909
 ] 

Hadoop QA commented on HBASE-13356:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12736478/HBASE-13356-0.98.patch
  against 0.98 branch at commit 0e6102a68cc95f0240fa72a5f86866c07b8744b7.
  ATTACHMENT ID: 12736478

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
24 warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.backup.TestHFileArchiving
  org.apache.hadoop.hbase.master.TestTableLockManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14251//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14251//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14251//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14251//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14251//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-05-31 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14566923#comment-14566923
 ] 

Ted Yu commented on HBASE-13356:


Looks pretty good.
Minor comments:
{code}
+ * MultiTableSnapshotInputFormat generalizes {@link 
org.apache.hadoop.hbase.mapred
+ * .TableSnapshotInputFormat}
{code}
Better put '{@link ' on second line so that the class name is on same line.

In MultiTableSnapshotInputFormatImpl :
{code}
+  // TODO: these probably belong elsewhere/may already be implemented 
elsewhere.
+
{code}
The above can be removed, right ?

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356.2.patch, 
> HBASE-13356.3.patch, HBASE-13356.4.patch, HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-01 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567072#comment-14567072
 ] 

Hadoop QA commented on HBASE-13356:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12736497/HBASE-13356-branch-1.patch
  against branch-1 branch at commit 0e6102a68cc95f0240fa72a5f86866c07b8744b7.
  ATTACHMENT ID: 12736497

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 17 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.1 2.5.2 2.6.0)

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.backup.TestHFileArchiving

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14253//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14253//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14253//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14253//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14253//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-01 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567979#comment-14567979
 ] 

Ted Yu commented on HBASE-13356:


[~apurtell] [~enis] [~ndimiduk] :
Please take a look.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-02 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569510#comment-14569510
 ] 

Nick Dimiduk commented on HBASE-13356:
--

This is adding new functionality, so -1 for branch-1.1 and branch-1.0.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14569693#comment-14569693
 ] 

Ted Yu commented on HBASE-13356:


Planning to commit to master and branch-1 branches soon.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570173#comment-14570173
 ] 

Ted Yu commented on HBASE-13356:


Integrated to branch-1 and master.

Thanks for the patch, Andrew.

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-02 Thread Andrew Mains (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570177#comment-14570177
 ] 

Andrew Mains commented on HBASE-13356:
--

No problem! Thanks for the review, and the patience with my noob formatting 
issues :)

> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570252#comment-14570252
 ] 

Hudson commented on HBASE-13356:


FAILURE: Integrated in HBase-1.2 #130 (See 
[https://builds.apache.org/job/HBase-1.2/130/])
HBASE-13356 HBase should provide an InputFormat supporting multiple scans in 
mapreduce jobs over snapshots (Andrew Mains) (tedyu: rev 
39ab55841d1f6acb649974367e1e3b5be914c017)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
* hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormatImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan2.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormatImpl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatTestBase.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapred/TestMultiTableSnapshotInputFormat.java


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots

2015-06-02 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14570387#comment-14570387
 ] 

Hudson commented on HBASE-13356:


SUCCESS: Integrated in HBase-TRUNK #6542 (See 
[https://builds.apache.org/job/HBase-TRUNK/6542/])
HBASE-13356 HBase should provide an InputFormat supporting multiple scans in 
mapreduce jobs over snapshots (Andrew Mains) (tedyu: rev 
722fd17069a302f4de12c22212d54d80bed81aed)
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormatImpl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestConfigurationUtil.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableSnapshotInputFormat.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/TableMapReduceUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormatImpl.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapred/TestMultiTableSnapshotInputFormat.java
* hbase-server/src/main/java/org/apache/hadoop/hbase/util/ConfigurationUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/MultiTableInputFormatTestBase.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapred/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/MultiTableSnapshotInputFormat.java
* 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableMapReduceUtil.java
* 
hbase-server/src/test/java/org/apache/hadoop/hbase/mapreduce/TestMultiTableSnapshotInputFormatImpl.java


> HBase should provide an InputFormat supporting multiple scans in mapreduce 
> jobs over snapshots
> --
>
> Key: HBASE-13356
> URL: https://issues.apache.org/jira/browse/HBASE-13356
> Project: HBase
>  Issue Type: New Feature
>  Components: mapreduce
>Reporter: Andrew Mains
>Assignee: Andrew Mains
>Priority: Minor
> Fix For: 2.0.0, 1.2.0
>
> Attachments: HBASE-13356-0.98.patch, HBASE-13356-branch-1.patch, 
> HBASE-13356.2.patch, HBASE-13356.3.patch, HBASE-13356.4.patch, 
> HBASE-13356.patch
>
>
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs 
> over live tables (via MultiTableInputFormat) but only supports a single scan 
> for mapreduce jobs over table snapshots. It would be handy to support 
> multiple scans over snapshots as well, probably through another input format 
> (MultiTableSnapshotInputFormat?). To mimic the functionality present in 
> MultiTableInputFormat, the new input format would likely have to take in the 
> names of all snapshots used in addition to the scans.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)