[ https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064334#comment-16064334 ]
Hadoop QA commented on HBASE-18161: ----------------------------------- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 2 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 6s {color} | {color:red} hbase-server in master has 10 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 73m 38s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 44s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 220m 25s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 47s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 332m 33s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestRegionReplicaFailover | | | hadoop.hbase.client.TestMobSnapshotCloneIndependence | | | hadoop.hbase.regionserver.TestEncryptionKeyRotation | | | hadoop.hbase.regionserver.TestPerColumnFamilyFlush | | | hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver | | Timed out junit tests | org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream | | | org.apache.hadoop.hbase.client.TestFromClientSide3 | | | org.apache.hadoop.hbase.quotas.TestSpaceQuotas | | | org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor | | | org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClient | | | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874594/MultiHFileOutputFormatSupport_HBASE_18161_v11.patch | | JIRA Issue | HBASE-18161 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux af818bf1f967 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 35693f0 | | Default Java | 1.8.0_131 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Incremental Load support for Multiple-Table HFileOutputFormat > ------------------------------------------------------------- > > Key: HBASE-18161 > URL: https://issues.apache.org/jira/browse/HBASE-18161 > Project: HBase > Issue Type: New Feature > Reporter: Densel Santhmayor > Priority: Minor > Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v10.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v11.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v2.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v4.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v6.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v8.patch, > MultiHFileOutputFormatSupport_HBASE_18161_v9.patch > > > h2. Introduction > MapReduce currently supports the ability to write HBase records in bulk to > HFiles for a single table. The file(s) can then be uploaded to the relevant > RegionServers information with reasonable latency. This feature is useful to > make a large set of data available for queries at the same time as well as > provides a way to efficiently process very large input into HBase without > affecting query latencies. > There is, however, no support to write variations of the same record key to > HFiles belonging to multiple HBase tables from within the same MapReduce job. > > h2. Goal > The goal of this JIRA is to extend HFileOutputFormat2 to support writing to > HFiles for different tables within the same MapReduce job while single-table > HFile features backwards-compatible. > For our use case, we needed to write a record key to a smaller HBase table > for quicker access, and the same record key with a date appended to a larger > table for longer term storage with chronological access. Each of these tables > would have different TTL and other settings to support their respective > access patterns. We also needed to be able to bulk write records to multiple > tables with different subsets of very large input as efficiently as possible. > Rather than run the MapReduce job multiple times (one for each table or > record structure), it would be useful to be able to parse the input a single > time and write to multiple tables simultaneously. > Additionally, we'd like to maintain backwards compatibility with the existing > heavily-used HFileOutputFormat2 interface to allow benefits such as locality > sensitivity (that was introduced long after we implemented support for > multiple tables) to support both single table and multi table hfile writes. > h2. Proposal > * Backwards compatibility for existing single table support in > HFileOutputFormat2 will be maintained and in this case, mappers will need to > emit the table rowkey as before. However, a new class - > MultiHFileOutputFormat - will provide a helper function to generate a rowkey > for mappers that prefixes the desired tablename to the existing rowkey as > well as provides configureIncrementalLoad support for multiple tables. > * HFileOutputFormat2 will be updated in the following way: > ** configureIncrementalLoad will now accept multiple table descriptor and > region locator pairs, analogous to the single pair currently accepted by > HFileOutputFormat2. > ** Compression, Block Size, Bloom Type and Datablock settings PER column > family that are set in the Configuration object are now indexed and retrieved > by tablename AND column family > ** getRegionStartKeys will now support multiple regionlocators and calculate > split points and therefore partitions collectively for all tables. Similarly, > now the eventual number of Reducers will be equal to the total number of > partitions across all tables. > ** The RecordWriter class will be able to process rowkeys either with or > without the tablename prepended depending on how configureIncrementalLoad was > configured with MultiHFileOutputFormat or HFileOutputFormat2. > * The use of MultiHFileOutputFormat will write the output into HFiles which > will match the output format of HFileOutputFormat2. However, while the > default use case will keep the existing directory structure with column > family name as the directory and HFiles within that directory, in the case of > MultiHFileOutputFormat, it will output HFiles in the output directory with > the following relative paths: > {noformat} > --table1 > --family1 > --HFiles > --table2 > --family1 > --family2 > --HFiles > {noformat} > This aims to be a comprehensive solution to the original tickets - HBASE-3727 > and HBASE-16261. Thanks to [~clayb] for his support. This is a contribution > from Bloomberg developers. > The patch will be attached shortly. -- This message was sent by Atlassian JIRA (v6.4.14#64029)