[jira] [Commented] (HBASE-22380) break circle replication when doing bulkload
[ https://issues.apache.org/jira/browse/HBASE-22380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935208#comment-16935208 ] HBase QA commented on HBASE-22380: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 1m 33s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} dupname {color} | {color:green} 0m 0s{color} | {color:green} No case conflicting files found. {color} | | {color:blue}0{color} | {color:blue} prototool {color} | {color:blue} 0m 0s{color} | {color:blue} prototool was not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 1s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 26 new or modified test files. {color} | || || || || {color:brown} branch-2.1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 14s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 21s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 51s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 22s{color} | {color:green} branch-2.1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 14s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 2s{color} | {color:green} branch-2.1 passed {color} | | {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 2m 47s{color} | {color:blue} Used deprecated FindBugs config; considering switching to SpotBugs. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 18s{color} | {color:green} branch-2.1 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 4m 7s{color} | {color:red} root in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 50s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 29s{color} | {color:red} hbase-server: The patch generated 4 new + 312 unchanged - 3 fixed = 316 total (was 315) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:red}-1{color} | {color:red} shadedjars {color} | {color:red} 3m 40s{color} | {color:red} patch has 264 errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 2m 13s{color} | {color:red} The patch causes 264 errors with Hadoop v2.7.7. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 4m 32s{color} | {color:red} The patch causes 264 errors with Hadoop v2.8.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 6m 55s{color} | {color:red} The patch causes 264 errors with Hadoop v3.0.3. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 12s{color} | {color:red} The patch causes 264 errors with Hadoop v3.1.2. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 1m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 59s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 6m 49s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 36s{color} | {color:green}
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935112#comment-16935112 ] Hudson commented on HBASE-23058: Results for branch branch-2.2 [build #632 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/632/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/632//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/632//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/632//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935099#comment-16935099 ] Hudson commented on HBASE-23058: Results for branch branch-2 [build #2292 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2292/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2292//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2292//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/2292//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22380) break circle replication when doing bulkload
[ https://issues.apache.org/jira/browse/HBASE-22380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HBASE-22380: - Status: Patch Available (was: In Progress) > break circle replication when doing bulkload > > > Key: HBASE-22380 > URL: https://issues.apache.org/jira/browse/HBASE-22380 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.5, 2.1.5, 2.0.5, 1.4.10, 2.2.0, 3.0.0, 1.5.0, 2.3.0 >Reporter: chenxu >Assignee: Wellington Chevreuil >Priority: Critical > Labels: bulkload > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.1.7, 2.2.2, 1.4.12 > > Attachments: HBASE-22380.branch-2.1.0001.patch > > > when enabled master-master bulkload replication, HFiles will be replicated > circularly between two clusters -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22380) break circle replication when doing bulkload
[ https://issues.apache.org/jira/browse/HBASE-22380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935097#comment-16935097 ] Wellington Chevreuil commented on HBASE-22380: -- Pushed both PRs to master and branch-2, respectively. While cherry-picking branch-2 commit into branch-2.1, faced many conflicts who required manual fixes, so thought worth submit a patch to go through the pre-commit tests. If test passes, I'll proceed with the actual commit. > break circle replication when doing bulkload > > > Key: HBASE-22380 > URL: https://issues.apache.org/jira/browse/HBASE-22380 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.0.5, 2.3.0, 2.1.5, 1.3.5 >Reporter: chenxu >Assignee: Wellington Chevreuil >Priority: Critical > Labels: bulkload > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.1.7, 2.2.2, 1.4.12 > > Attachments: HBASE-22380.branch-2.1.0001.patch > > > when enabled master-master bulkload replication, HFiles will be replicated > circularly between two clusters -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-22380) break circle replication when doing bulkload
[ https://issues.apache.org/jira/browse/HBASE-22380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wellington Chevreuil updated HBASE-22380: - Attachment: HBASE-22380.branch-2.1.0001.patch > break circle replication when doing bulkload > > > Key: HBASE-22380 > URL: https://issues.apache.org/jira/browse/HBASE-22380 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 3.0.0, 1.5.0, 2.2.0, 1.4.10, 2.0.5, 2.3.0, 2.1.5, 1.3.5 >Reporter: chenxu >Assignee: Wellington Chevreuil >Priority: Critical > Labels: bulkload > Fix For: 3.0.0, 1.5.0, 2.3.0, 2.1.7, 2.2.2, 1.4.12 > > Attachments: HBASE-22380.branch-2.1.0001.patch > > > when enabled master-master bulkload replication, HFiles will be replicated > circularly between two clusters -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935085#comment-16935085 ] Hudson commented on HBASE-23058: Results for branch master [build #1471 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/1471/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/1471//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1471//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/1471//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935082#comment-16935082 ] Hudson commented on HBASE-23058: Results for branch branch-2.1 [build #1628 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1628/]: (/) *{color:green}+1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1628//General_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1628//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1628//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (HBASE-23052) hbase-thirdparty version of GSON that works for branch-1
[ https://issues.apache.org/jira/browse/HBASE-23052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey resolved HBASE-23052. - Fix Version/s: thirdparty-3.0.0 Release Note: HBase's internal use of GSON is now done in a stand alone module named `hbase-shaded-gson` rather than as a part of the `hbase-shaded-miscellaneous` module. The relocated fully qualified class names are still the same. This internal artifact is also set to maintain JDK bytecode compatibility as appropriate for use with branches-1 based releases in addition to the existing use in later release lines. Resolution: Fixed > hbase-thirdparty version of GSON that works for branch-1 > > > Key: HBASE-23052 > URL: https://issues.apache.org/jira/browse/HBASE-23052 > Project: HBase > Issue Type: Improvement > Components: dependencies >Affects Versions: thirdparty-2.2.1 >Reporter: Sean Busbey >Assignee: Sean Busbey >Priority: Blocker > Fix For: thirdparty-3.0.0 > > > HBASE-23015 is buttoning up a needed move off of jackson 1 in branches-1. > We've already got the implementation work in place to move onto the > hbase-thirdparty relocated GSON, but we can't currently build because other > dependencies included in the miscellaneous module is JDK8+ only and branch-1 > needs to work for jdk7. > couple of options: > * make the entire hbase-thirdparty repo work with jdk7 > * break out gson from the clearing house miscellaneous module and make *just* > the new gson module jdk7 compatible > * make a jdk7 compatible miscellaneous module and move gson over there (in > case we decide to move branch-1 off of other problematic libraries e.g. guava) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935074#comment-16935074 ] Hudson commented on HBASE-23058: Results for branch branch-1 [build #1079 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1079/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1079//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1079//JDK7_Nightly_Build_Report/] (/) {color:green}+1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/1079//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935050#comment-16935050 ] Hudson commented on HBASE-23058: Results for branch branch-1.4 [build #1025 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1025/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1025//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1025//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/1025//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hbase] wchevreuil merged pull request #494: HBASE-22380 break circle replication when doing bulkload
wchevreuil merged pull request #494: HBASE-22380 break circle replication when doing bulkload URL: https://github.com/apache/hbase/pull/494 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935042#comment-16935042 ] Guanghao Zhang commented on HBASE-23035: Pushed to branch-2.2+. > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23035) Retain region to the last RegionServer make the failover slower
[ https://issues.apache.org/jira/browse/HBASE-23035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-23035: --- Fix Version/s: 2.2.2 2.3.0 3.0.0 > Retain region to the last RegionServer make the failover slower > --- > > Key: HBASE-23035 > URL: https://issues.apache.org/jira/browse/HBASE-23035 > Project: HBase > Issue Type: Bug >Affects Versions: 3.0.0, 2.3.0, 2.2.1, 2.1.6 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Major > Fix For: 3.0.0, 2.3.0, 2.2.2 > > > Now if one RS crashed, the regions will try to use the old location for the > region deploy. But one RS only have 3 threads to open region by default. If a > RS have hundreds of regions, the failover is very slower. Assign to same RS > may have good locality if the Datanode is deploied on same host. But slower > failover make the availability worse. And the locality is not big deal when > deploy HBase on cloud. > This was introduced by HBASE-18946. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [hbase] infraio merged pull request #631: HBASE-23035 Retain region to the last RegionServer make the failover …
infraio merged pull request #631: HBASE-23035 Retain region to the last RegionServer make the failover … URL: https://github.com/apache/hbase/pull/631 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] wchevreuil merged pull request #566: HBASE-22380 break circle replication when doing bulkload
wchevreuil merged pull request #566: HBASE-22380 break circle replication when doing bulkload URL: https://github.com/apache/hbase/pull/566 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on issue #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on issue #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#issuecomment-533793894 @joshelser , thanks for the review..did all the changes and pushed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858250 ## File path: hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestSpaceQuotaBasicFunctioning.java ## @@ -221,4 +225,30 @@ public void testTableQuotaOverridesNamespaceQuota() throws Exception { Bytes.toBytes("reject")); helper.verifyViolation(policy, tn, p); } + + @Test + public void testDisablePolicyQuotaAndViolate() throws Exception { +TableName tableName = helper.createTable(); +helper.setQuotaLimit(tableName, SpaceViolationPolicy.DISABLE, 2L); +helper.writeData(tableName, SpaceQuotaHelperForTests.ONE_MEGABYTE * 3L); + +HMaster master = TEST_UTIL.getMiniHBaseCluster().getMaster(); +MasterQuotaManager quotaManager = master.getMasterQuotaManager(); + +// Sufficient time for all the chores to run. +Thread.sleep(5000); Review comment: done. added a wait predicate. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858245 ## File path: hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestSpaceQuotaBasicFunctioning.java ## @@ -221,4 +225,30 @@ public void testTableQuotaOverridesNamespaceQuota() throws Exception { Bytes.toBytes("reject")); helper.verifyViolation(policy, tn, p); } + + @Test + public void testDisablePolicyQuotaAndViolate() throws Exception { +TableName tableName = helper.createTable(); +helper.setQuotaLimit(tableName, SpaceViolationPolicy.DISABLE, 2L); +helper.writeData(tableName, SpaceQuotaHelperForTests.ONE_MEGABYTE * 3L); Review comment: changed to 1MB. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858253 ## File path: hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestSpaceQuotaBasicFunctioning.java ## @@ -221,4 +225,30 @@ public void testTableQuotaOverridesNamespaceQuota() throws Exception { Bytes.toBytes("reject")); helper.verifyViolation(policy, tn, p); } + + @Test + public void testDisablePolicyQuotaAndViolate() throws Exception { +TableName tableName = helper.createTable(); +helper.setQuotaLimit(tableName, SpaceViolationPolicy.DISABLE, 2L); +helper.writeData(tableName, SpaceQuotaHelperForTests.ONE_MEGABYTE * 3L); + +HMaster master = TEST_UTIL.getMiniHBaseCluster().getMaster(); +MasterQuotaManager quotaManager = master.getMasterQuotaManager(); + +// Sufficient time for all the chores to run. +Thread.sleep(5000); + +long timeToPrune = System.currentTimeMillis() + 11 * 60 * 1000; +quotaManager.pruneEntriesOlderThan(timeToPrune); + +// Check if disabled table region report present in the map after retention period expired. +// It should be present after retention period expired. +for (Map.Entry entry : quotaManager.snapshotRegionSizes().entrySet()) { Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858243 ## File path: hbase-server/src/test/java/org/apache/hadoop/hbase/quotas/TestMasterQuotaManager.java ## @@ -51,6 +55,10 @@ public void testOldEntriesRemoved() { MasterServices masterServices = mock(MasterServices.class); MasterQuotaManager manager = new MasterQuotaManager(masterServices); manager.initializeRegionSizes(); +HBaseTestingUtility TEST_UTIL = new HBaseTestingUtility(); +Configuration conf = TEST_UTIL.getConfiguration(); +conf.set(QUOTA_CONF_KEY, "false"); +when(masterServices.getConfiguration()).thenReturn(conf); Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858240 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterQuotaManager.java ## @@ -702,15 +703,53 @@ int pruneEntriesOlderThan(long timeToPruneBefore) { Iterator> iterator = regionSizes.entrySet().iterator(); while (iterator.hasNext()) { - long currentEntryTime = iterator.next().getValue().getTime(); - if (currentEntryTime < timeToPruneBefore) { + RegionInfo regionInfo = iterator.next().getKey(); + long currentEntryTime = regionSizes.get(regionInfo).getTime(); + boolean isInViolationAndPolicyDisable = isInViolationAndPolicyDisable(regionInfo.getTable()); + // do not prune the entries if table is in violation and + // violation policy is disable.prune entries older than time. + if (currentEntryTime < timeToPruneBefore && !isInViolationAndPolicyDisable) { iterator.remove(); numEntriesRemoved++; } } return numEntriesRemoved; } + /** + * Method to check if a table is in violation and policy set on table is DISABLE. + * + * @param tableName tableName to check. + * @return returns true if table is in violation and policy is disable else false. + */ + private boolean isInViolationAndPolicyDisable(TableName tableName) { +boolean isInViolationAtTable = false; +boolean isInViolationAndPolicyDisable = false; +SpaceViolationPolicy policy = null; +try { + if (QuotaUtil.isQuotaEnabled(masterServices.getConfiguration())) { +// Get Current Snapshot for the given table +SpaceQuotaSnapshot spaceQuotaSnapshot = + QuotaUtil.getCurrentSnapshotFromQuotaTable(masterServices.getConnection(), tableName); +if (spaceQuotaSnapshot != null) { + // check if table in violation + isInViolationAtTable = spaceQuotaSnapshot.getQuotaStatus().isInViolation(); + Optional policyAtNamespace = + spaceQuotaSnapshot.getQuotaStatus().getPolicy(); + if (policyAtNamespace.isPresent()) { +policy = policyAtNamespace.get(); + } +} + } + isInViolationAndPolicyDisable = Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858217 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterQuotaManager.java ## @@ -702,15 +703,53 @@ int pruneEntriesOlderThan(long timeToPruneBefore) { Iterator> iterator = regionSizes.entrySet().iterator(); while (iterator.hasNext()) { - long currentEntryTime = iterator.next().getValue().getTime(); - if (currentEntryTime < timeToPruneBefore) { + RegionInfo regionInfo = iterator.next().getKey(); + long currentEntryTime = regionSizes.get(regionInfo).getTime(); + boolean isInViolationAndPolicyDisable = isInViolationAndPolicyDisable(regionInfo.getTable()); + // do not prune the entries if table is in violation and + // violation policy is disable.prune entries older than time. + if (currentEntryTime < timeToPruneBefore && !isInViolationAndPolicyDisable) { iterator.remove(); numEntriesRemoved++; } } return numEntriesRemoved; } + /** + * Method to check if a table is in violation and policy set on table is DISABLE. + * + * @param tableName tableName to check. + * @return returns true if table is in violation and policy is disable else false. + */ + private boolean isInViolationAndPolicyDisable(TableName tableName) { +boolean isInViolationAtTable = false; +boolean isInViolationAndPolicyDisable = false; +SpaceViolationPolicy policy = null; +try { + if (QuotaUtil.isQuotaEnabled(masterServices.getConfiguration())) { +// Get Current Snapshot for the given table +SpaceQuotaSnapshot spaceQuotaSnapshot = + QuotaUtil.getCurrentSnapshotFromQuotaTable(masterServices.getConnection(), tableName); Review comment: using ```QuotaObserverChore#getTableQuotaSnapshot(TableName)``` now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858236 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterQuotaManager.java ## @@ -702,15 +703,53 @@ int pruneEntriesOlderThan(long timeToPruneBefore) { Iterator> iterator = regionSizes.entrySet().iterator(); while (iterator.hasNext()) { - long currentEntryTime = iterator.next().getValue().getTime(); - if (currentEntryTime < timeToPruneBefore) { + RegionInfo regionInfo = iterator.next().getKey(); + long currentEntryTime = regionSizes.get(regionInfo).getTime(); + boolean isInViolationAndPolicyDisable = isInViolationAndPolicyDisable(regionInfo.getTable()); + // do not prune the entries if table is in violation and + // violation policy is disable.prune entries older than time. + if (currentEntryTime < timeToPruneBefore && !isInViolationAndPolicyDisable) { iterator.remove(); numEntriesRemoved++; } } return numEntriesRemoved; } + /** + * Method to check if a table is in violation and policy set on table is DISABLE. + * + * @param tableName tableName to check. + * @return returns true if table is in violation and policy is disable else false. + */ + private boolean isInViolationAndPolicyDisable(TableName tableName) { +boolean isInViolationAtTable = false; +boolean isInViolationAndPolicyDisable = false; +SpaceViolationPolicy policy = null; +try { + if (QuotaUtil.isQuotaEnabled(masterServices.getConfiguration())) { +// Get Current Snapshot for the given table +SpaceQuotaSnapshot spaceQuotaSnapshot = + QuotaUtil.getCurrentSnapshotFromQuotaTable(masterServices.getConnection(), tableName); +if (spaceQuotaSnapshot != null) { + // check if table in violation + isInViolationAtTable = spaceQuotaSnapshot.getQuotaStatus().isInViolation(); + Optional policyAtNamespace = Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858201 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterQuotaManager.java ## @@ -702,15 +703,53 @@ int pruneEntriesOlderThan(long timeToPruneBefore) { Iterator> iterator = regionSizes.entrySet().iterator(); while (iterator.hasNext()) { - long currentEntryTime = iterator.next().getValue().getTime(); - if (currentEntryTime < timeToPruneBefore) { + RegionInfo regionInfo = iterator.next().getKey(); + long currentEntryTime = regionSizes.get(regionInfo).getTime(); + boolean isInViolationAndPolicyDisable = isInViolationAndPolicyDisable(regionInfo.getTable()); + // do not prune the entries if table is in violation and + // violation policy is disable.prune entries older than time. + if (currentEntryTime < timeToPruneBefore && !isInViolationAndPolicyDisable) { iterator.remove(); numEntriesRemoved++; } } return numEntriesRemoved; } + /** + * Method to check if a table is in violation and policy set on table is DISABLE. + * + * @param tableName tableName to check. + * @return returns true if table is in violation and policy is disable else false. + */ + private boolean isInViolationAndPolicyDisable(TableName tableName) { +boolean isInViolationAtTable = false; +boolean isInViolationAndPolicyDisable = false; +SpaceViolationPolicy policy = null; +try { + if (QuotaUtil.isQuotaEnabled(masterServices.getConfiguration())) { Review comment: removed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [hbase] shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table
shardul-cr7 commented on a change in pull request #572: HBASE-22012 Space Quota: DisableTableViolationPolicy will cause cycles of enable/disable table URL: https://github.com/apache/hbase/pull/572#discussion_r326858195 ## File path: hbase-server/src/main/java/org/apache/hadoop/hbase/quotas/MasterQuotaManager.java ## @@ -702,15 +703,53 @@ int pruneEntriesOlderThan(long timeToPruneBefore) { Iterator> iterator = regionSizes.entrySet().iterator(); while (iterator.hasNext()) { - long currentEntryTime = iterator.next().getValue().getTime(); - if (currentEntryTime < timeToPruneBefore) { + RegionInfo regionInfo = iterator.next().getKey(); + long currentEntryTime = regionSizes.get(regionInfo).getTime(); + boolean isInViolationAndPolicyDisable = isInViolationAndPolicyDisable(regionInfo.getTable()); + // do not prune the entries if table is in violation and + // violation policy is disable.prune entries older than time. Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-23058: --- Fix Version/s: 2.2.2 2.1.7 1.4.11 2.3.0 1.5.0 3.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to branch-2.1+, branch-1 and branch-1.4. Thanks [~Qiongwu] for contributing. > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0, 1.4.11, 2.1.7, 2.2.2 > > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-23058) Should be "Column Family Name" in table.jsp
[ https://issues.apache.org/jira/browse/HBASE-23058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935001#comment-16935001 ] Guanghao Zhang commented on HBASE-23058: The failed ut not related. Let me commit this. > Should be "Column Family Name" in table.jsp > --- > > Key: HBASE-23058 > URL: https://issues.apache.org/jira/browse/HBASE-23058 > Project: HBase > Issue Type: Improvement >Reporter: Qiongwu >Assignee: Qiongwu >Priority: Minor > Attachments: 2019-09-20 19-16-22屏幕截图.png, HBASE-23058.master.001.patch > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (HBASE-22514) Move rsgroup feature into core of HBase
[ https://issues.apache.org/jira/browse/HBASE-22514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934995#comment-16934995 ] Hudson commented on HBASE-22514: Results for branch HBASE-22514 [build #120 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/120/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/120//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/120//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/120//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-22514/120//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Move rsgroup feature into core of HBase > --- > > Key: HBASE-22514 > URL: https://issues.apache.org/jira/browse/HBASE-22514 > Project: HBase > Issue Type: Umbrella > Components: Admin, Client, rsgroup >Reporter: Yechao Chen >Assignee: Duo Zhang >Priority: Major > Attachments: HBASE-22514.master.001.patch, > image-2019-05-31-18-25-38-217.png > > > The class RSGroupAdminClient is not public > we need to use java api RSGroupAdminClient to manager RSG > so RSGroupAdminClient should be public > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (HBASE-23062) Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions.
[ https://issues.apache.org/jira/browse/HBASE-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhanxiongWang updated HBASE-23062: -- Fix Version/s: 1.2.5 > Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the > size is too big, some rowkeys will lost without exctpions. > --- > > Key: HBASE-23062 > URL: https://issues.apache.org/jira/browse/HBASE-23062 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.6.1 >Reporter: ZhanxiongWang >Priority: Major > Fix For: 1.2.5 > > Attachments: pom.xml > > > I did the experiment in three ways. One way I use spark to read hbase, second > way I use mapreduce to read hbase. In both cases, when I increase the Scan > Caching size, some data will be lost. To be more accurately, When I set > scan.setCaching(500), I can receive 7622 rows of data, but when I set > scan.setCaching(5), I can receive only 4226 rows of data. Third way I > use Scan to read hbase directly, caching size does not affect the result, I > can always receive 7622 rows of data. > The seriousness of the problem is that the data is lost but there is no > exceptions, it is difficult to find the reason. > My spark code is like this: > {code:java} > Configuration hbaseConfiguration = HBaseConfiguration.create(); > hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); > hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); > hbaseConfiguration.set("zookeeper.znode.parent", zkPath); > hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); > hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); > hbaseConfiguration.setLong("hbase.rpc.timeout",600); > final Scan hbaseScan = new Scan(); > hbaseScan.addFamily(familyName); > hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! > for(String[] cell:cellNames){ > String column = cell[0]; > hbaseScan.addColumn(familyName,Bytes.toBytes(column)); > } > hbaseScan.setStartRow(Bytes.toBytes(startRowkeyStr)); > hbaseScan.setStopRow(Bytes.toBytes(endRowkeyStr)); > try { > ClientProtos.Scan scanProto = ProtobufUtil.toScan(hbaseScan); > hbaseConfiguration.set(TableInputFormat.SCAN, > Base64.encodeBytes(scanProto.toByteArray())); > JavaPairRDD pairRDD = > jsc.newAPIHadoopRDD( > hbaseConfiguration,TableInputFormat.class, ImmutableBytesWritable.class, > Result.class ); > System.out.println("pairRDD.count(): " + pairRDD.count()); > } > catch (IOException e) { > System.out.println("Scan Exception!! " + e.getMessage()); > } > {code} > My mapreduce code is like this: > {code:java} > static class HbaseMapper extends TableMapper { >@Override protected void map(ImmutableBytesWritable key, Result > value,Mapper.Context context) throws IOException, InterruptedException { > for(Cell cell :value.rawCells()){ > context.write(new ImmutableBytesWritable("A".getBytes()),new > Text("max")); > } >} > } > public static void main(String[] args) throws Exception { > org.apache.hadoop.conf.Configuration hbaseConfiguration = > HBaseConfiguration.create(); > hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); > hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); > hbaseConfiguration.set("zookeeper.znode.parent", zkPath); > hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); > hbaseConfiguration.setLong("hbase.rpc.timeout",600); > Job job = Job.getInstance(hbaseConfiguration); > job.setJarByClass(App.class); > List list = new ArrayList(); > Scan scan = new Scan(); > scan.addFamily(Bytes.toBytes(familyName)); > scan.setCaching(5);//if Caching is too big, some rowkeys will lost! > for (String[] cell : cellNames) { > String column = cell[0]; > scan.addColumn(familyName,Bytes.toBytes(column)); > } > scan.setStartRow(Bytes.toBytes(startRowkeyStr)); > scan.setStopRow(Bytes.toBytes(endRowkeyStr)); > scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, > Bytes.toBytes(hbaseTableName)); > list.add(scan); > System.out.println("size: "+list.size()); > TableMapReduceUtil.initTableMapperJob(list,HbaseMapper.class,ImmutableBytesWritable.class,Text.class, > job); > job.setMapOutputKeyClass(ImmutableBytesWritable.class); > job.setMapOutputValueClass(Text.class); > job.setOutputKeyClass(ImmutableBytesWritable.class); > job.setOutputValueClass(Text.class); > FileOutputFormat.setOutputPath(job, new Path("maxTestOutput")); > System.exit(job.waitForCompletion(true) ? 0 : 1); > }{code} > The pom.xml for mapreduce code is like this: > [^pom.xml] > Third way code is like this: > {code:java} > public static void main(String[] args) throws Exception{ > org.apache.h
[jira] [Updated] (HBASE-23062) Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions.
[ https://issues.apache.org/jira/browse/HBASE-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhanxiongWang updated HBASE-23062: -- Description: I did the experiment in three ways. One way I use spark to read hbase, second way I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. Third way I use Scan to read hbase directly, caching size does not affect the result, I can always receive 7622 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyName); hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! for(String[] cell:cellNames){ String column = cell[0]; hbaseScan.addColumn(familyName,Bytes.toBytes(column)); } hbaseScan.setStartRow(Bytes.toBytes(startRowkeyStr)); hbaseScan.setStopRow(Bytes.toBytes(endRowkeyStr)); try { ClientProtos.Scan scanProto = ProtobufUtil.toScan(hbaseScan); hbaseConfiguration.set(TableInputFormat.SCAN, Base64.encodeBytes(scanProto.toByteArray())); JavaPairRDD pairRDD = jsc.newAPIHadoopRDD( hbaseConfiguration,TableInputFormat.class, ImmutableBytesWritable.class, Result.class ); System.out.println("pairRDD.count(): " + pairRDD.count()); } catch (IOException e) { System.out.println("Scan Exception!! " + e.getMessage()); } {code} My mapreduce code is like this: {code:java} static class HbaseMapper extends TableMapper { @Override protected void map(ImmutableBytesWritable key, Result value,Mapper.Context context) throws IOException, InterruptedException { for(Cell cell :value.rawCells()){ context.write(new ImmutableBytesWritable("A".getBytes()),new Text("max")); } } } public static void main(String[] args) throws Exception { org.apache.hadoop.conf.Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); Job job = Job.getInstance(hbaseConfiguration); job.setJarByClass(App.class); List list = new ArrayList(); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(familyName)); scan.setCaching(5);//if Caching is too big, some rowkeys will lost! for (String[] cell : cellNames) { String column = cell[0]; scan.addColumn(familyName,Bytes.toBytes(column)); } scan.setStartRow(Bytes.toBytes(startRowkeyStr)); scan.setStopRow(Bytes.toBytes(endRowkeyStr)); scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(hbaseTableName)); list.add(scan); System.out.println("size: "+list.size()); TableMapReduceUtil.initTableMapperJob(list,HbaseMapper.class,ImmutableBytesWritable.class,Text.class, job); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(job, new Path("maxTestOutput")); System.exit(job.waitForCompletion(true) ? 0 : 1); }{code} The pom.xml for mapreduce code is like this: [^pom.xml] Third way code is like this: {code:java} public static void main(String[] args) throws Exception{ org.apache.hadoop.conf.Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); Connection conn = ConnectionFactory.createConnection(hbaseConfiguration); HTable table = (HTable) conn.getTable(TableName.valueOf(hbaseTableName)); Long res = 0l; final Scan hbaseScan = new Scan(); hbaseScan.addFamily(Bytes.toBytes(familyName)); hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! for (String[] cell : cellNames) { String column = cell[0];
[jira] [Updated] (HBASE-23062) Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions.
[ https://issues.apache.org/jira/browse/HBASE-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhanxiongWang updated HBASE-23062: -- Description: I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyName); hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! for(String[] cell:cellNames){ String column = cell[0]; hbaseScan.addColumn(familyName,Bytes.toBytes(column)); } hbaseScan.setStartRow(Bytes.toBytes(startRowkeyStr)); hbaseScan.setStopRow(Bytes.toBytes(endRowkeyStr)); try { ClientProtos.Scan scanProto = ProtobufUtil.toScan(hbaseScan); hbaseConfiguration.set(TableInputFormat.SCAN, Base64.encodeBytes(scanProto.toByteArray())); JavaPairRDD pairRDD = jsc.newAPIHadoopRDD( hbaseConfiguration,TableInputFormat.class, ImmutableBytesWritable.class, Result.class ); System.out.println("pairRDD.count(): " + pairRDD.count()); } catch (IOException e) { System.out.println("Scan Exception!! " + e.getMessage()); } {code} My mapreduce code is like this: {code:java} static class HbaseMapper extends TableMapper { @Override protected void map(ImmutableBytesWritable key, Result value,Mapper.Context context) throws IOException, InterruptedException { for(Cell cell :value.rawCells()){ context.write(new ImmutableBytesWritable("A".getBytes()),new Text("max")); } } } public static void main(String[] args) throws Exception { org.apache.hadoop.conf.Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); Job job = Job.getInstance(hbaseConfiguration); job.setJarByClass(App.class); List list = new ArrayList(); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(familyName)); scan.setCaching(5);//if Caching is too big, some rowkeys will lost! for (String[] cell : cellNames) { String column = cell[0]; scan.addColumn(familyName,Bytes.toBytes(column)); } scan.setStartRow(Bytes.toBytes(startRowkeyStr)); scan.setStopRow(Bytes.toBytes(endRowkeyStr)); scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(hbaseTableName)); list.add(scan); System.out.println("size: "+list.size()); TableMapReduceUtil.initTableMapperJob(list,HbaseMapper.class,ImmutableBytesWritable.class,Text.class, job); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(job, new Path("maxTestOutput")); System.exit(job.waitForCompletion(true) ? 0 : 1); }{code} The pom.xml for mapreduce code is like this: [^pom.xml] was: I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily
[jira] [Updated] (HBASE-23062) Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions.
[ https://issues.apache.org/jira/browse/HBASE-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhanxiongWang updated HBASE-23062: -- Description: I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyName); hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! for(String[] cell:cellNames){ String column = cell[0]; hbaseScan.addColumn(familyName,Bytes.toBytes(column)); } hbaseScan.setStartRow(Bytes.toBytes(startRowkeyStr)); hbaseScan.setStopRow(Bytes.toBytes(endRowkeyStr)); try { ClientProtos.Scan scanProto = ProtobufUtil.toScan(hbaseScan); hbaseConfiguration.set(TableInputFormat.SCAN, Base64.encodeBytes(scanProto.toByteArray())); JavaPairRDD pairRDD = jsc.newAPIHadoopRDD( hbaseConfiguration, TableInputFormat.class, ImmutableBytesWritable.class, Result.class ); System.out.println("pairRDD.count(): " + pairRDD.count()); } catch (IOException e) { System.out.println("Scan Exception!! " + e.getMessage()); } {code} My mapreduce code is like this: {code:java} static class HbaseMapper extends TableMapper { @Override protected void map(ImmutableBytesWritable key, Result value,Mapper.Context context) throws IOException, InterruptedException { for(Cell cell :value.rawCells()){ context.write(new ImmutableBytesWritable("A".getBytes()),new Text("max")); } } } public static void main(String[] args) throws Exception { org.apache.hadoop.conf.Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); Job job = Job.getInstance(hbaseConfiguration); job.setJarByClass(App.class); List list = new ArrayList(); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(familyName)); scan.setCaching(5);//if Caching is too big, some rowkeys will lost! for (String[] cell : cellNames) { String column = cell[0]; scan.addColumn(familyName,Bytes.toBytes(column)); } scan.setStartRow(Bytes.toBytes(startRowkeyStr)); scan.setStopRow(Bytes.toBytes(endRowkeyStr)); scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(hbaseTableName)); list.add(scan); System.out.println("size: "+list.size()); TableMapReduceUtil.initTableMapperJob(list,HbaseMapper.class,ImmutableBytesWritable.class,Text.class, job); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(job, new Path("maxTestOutput")); System.exit(job.waitForCompletion(true) ? 0 : 1); }{code} The pom.xml for mapreduce code is like this: [^pom.xml] was: I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyN
[GitHub] [hbase] Apache-HBase commented on issue #544: HBASE-22917 Proc-WAL roll fails saying someone else has already created log
Apache-HBase commented on issue #544: HBASE-22917 Proc-WAL roll fails saying someone else has already created log URL: https://github.com/apache/hbase/pull/544#issuecomment-533775164 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Comment | |::|--:|:|:| | :blue_heart: | reexec | 0m 30s | Docker mode activated. | ||| _ Prechecks _ | | :green_heart: | dupname | 0m 0s | No case conflicting files found. | | :green_heart: | hbaseanti | 0m 0s | Patch does not have any anti-patterns. | | :green_heart: | @author | 0m 0s | The patch does not contain any @author tags. | | :yellow_heart: | test4tests | 0m 0s | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | ||| _ master Compile Tests _ | | :green_heart: | mvninstall | 7m 2s | master passed | | :green_heart: | compile | 0m 18s | master passed | | :green_heart: | checkstyle | 0m 16s | master passed | | :green_heart: | shadedjars | 5m 2s | branch has no errors when building our shaded downstream artifacts. | | :green_heart: | javadoc | 0m 17s | master passed | | :blue_heart: | spotbugs | 0m 38s | Used deprecated FindBugs config; considering switching to SpotBugs. | | :green_heart: | findbugs | 0m 35s | master passed | ||| _ Patch Compile Tests _ | | :green_heart: | mvninstall | 5m 22s | the patch passed | | :green_heart: | compile | 0m 17s | the patch passed | | :green_heart: | javac | 0m 17s | the patch passed | | :green_heart: | checkstyle | 0m 14s | the patch passed | | :green_heart: | whitespace | 0m 0s | The patch has no whitespace issues. | | :green_heart: | shadedjars | 5m 2s | patch has no errors when building our shaded downstream artifacts. | | :green_heart: | hadoopcheck | 17m 9s | Patch does not cause any errors with Hadoop 2.8.5 2.9.2 or 3.1.2. | | :green_heart: | javadoc | 0m 15s | the patch passed | | :green_heart: | findbugs | 0m 42s | the patch passed | ||| _ Other Tests _ | | :green_heart: | unit | 3m 27s | hbase-procedure in the patch passed. | | :green_heart: | asflicense | 0m 12s | The patch does not generate ASF License warnings. | | | | 53m 22s | | | Subsystem | Report/Notes | |--:|:-| | Docker | Client=19.03.1 Server=19.03.1 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-544/15/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hbase/pull/544 | | Optional Tests | dupname asflicense javac javadoc unit spotbugs findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 73d111a421e6 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/HBase-PreCommit-GitHub-PR_PR-544/out/precommit/personality/provided.sh | | git revision | master / 96a94ac3d0 | | Default Java | 1.8.0_181 | | Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-544/15/testReport/ | | Max. process+thread count | 283 (vs. ulimit of 1) | | modules | C: hbase-procedure U: hbase-procedure | | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-544/15/console | | versions | git=2.11.0 maven=2018-06-17T18:33:14Z) findbugs=3.1.11 | | Powered by | Apache Yetus 0.11.0 https://yetus.apache.org | This message was automatically generated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (HBASE-23062) Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions.
[ https://issues.apache.org/jira/browse/HBASE-23062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhanxiongWang updated HBASE-23062: -- Description: I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyName); hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! for(String[] cell:cellNames){ String column = cell[0]; hbaseScan.addColumn(familyName,Bytes.toBytes(column)); } hbaseScan.setStartRow(Bytes.toBytes(startRowkeyStr)); hbaseScan.setStopRow(Bytes.toBytes(endRowkeyStr)); try { ClientProtos.Scan scanProto = ProtobufUtil.toScan(hbaseScan); hbaseConfiguration.set(TableInputFormat.SCAN, Base64.encodeBytes(scanProto.toByteArray())); JavaPairRDD pairRDD = jsc.newAPIHadoopRDD( hbaseConfiguration, TableInputFormat.class, ImmutableBytesWritable.class, Result.class ); System.out.println("pairRDD.count(): " + pairRDD.count()); } catch (IOException e) { System.out.println("Scan Exception!! " + e.getMessage()); } {code} My mapreduce code is like this: {code:java} static class HbaseMapper extends TableMapper { @Override protected void map(ImmutableBytesWritable key, Result value,Mapper.Context context) throws IOException, InterruptedException { for(Cell cell :value.rawCells()){ context.write(new ImmutableBytesWritable("A".getBytes()),new Text("max")); } }} public static void main(String[] args) throws Exception { org.apache.hadoop.conf.Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); Job job = Job.getInstance(hbaseConfiguration); job.setJarByClass(App.class); List list = new ArrayList(); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(familyName)); scan.setCaching(5);//if Caching is too big, some rowkeys will lost! for (String[] cell : cellNames) { String column = cell[0]; scan.addColumn(familyName,Bytes.toBytes(column)); } scan.setStartRow(Bytes.toBytes(startRowkeyStr)); scan.setStopRow(Bytes.toBytes(endRowkeyStr)); scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(hbaseTableName)); list.add(scan); System.out.println("size: "+list.size()); TableMapReduceUtil.initTableMapperJob(list,HbaseMapper.class,ImmutableBytesWritable.class,Text.class, job); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(job, new Path("maxTestOutput")); System.exit(job.waitForCompletion(true) ? 0 : 1); }{code} The pom.xml for mapreduce code is like this: [^pom.xml] was: I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyName); hbaseScan.setCaching(5
[jira] [Created] (HBASE-23062) Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions.
ZhanxiongWang created HBASE-23062: - Summary: Use TableInputFormat to read data from Hbase, when Scan.setCaching(size) the size is too big, some rowkeys will lost without exctpions. Key: HBASE-23062 URL: https://issues.apache.org/jira/browse/HBASE-23062 Project: HBase Issue Type: Bug Affects Versions: 0.98.6.1 Reporter: ZhanxiongWang Attachments: pom.xml I did the experiment in two ways. One way I use spark to read hbase, the other I use mapreduce to read hbase. In both cases, when I increase the Scan Caching size, some data will be lost. To be more accurately, When I set scan.setCaching(500), I can receive 7622 rows of data, but when I set scan.setCaching(5), I can receive only 4226 rows of data. The seriousness of the problem is that the data is lost but there is no exceptions, it is difficult to find the reason. My spark code is like this: {code:java} Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.set(TableInputFormat.INPUT_TABLE,hbaseTableName); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); final Scan hbaseScan = new Scan(); hbaseScan.addFamily(familyName); hbaseScan.setCaching(5);//if Caching is too big, some rowkeys will lost! for(String[] cell:cellNames){ String column = cell[0]; hbaseScan.addColumn(familyName,Bytes.toBytes(column)); } hbaseScan.setStartRow(Bytes.toBytes(startRowkeyStr)); hbaseScan.setStopRow(Bytes.toBytes(endRowkeyStr)); try { ClientProtos.Scan scanProto = ProtobufUtil.toScan(hbaseScan); hbaseConfiguration.set(TableInputFormat.SCAN, Base64.encodeBytes(scanProto.toByteArray())); JavaPairRDD pairRDD = jsc.newAPIHadoopRDD( hbaseConfiguration, TableInputFormat.class, ImmutableBytesWritable.class, Result.class ); System.out.println("pairRDD.count(): " + pairRDD.count()); } catch (IOException e) { System.out.println("Scan Exception!! " + e.getMessage()); } {code} My mapreduce code is like this: {code:java} static class HbaseMapper extends TableMapper { @Override protected void map(ImmutableBytesWritable key, Result value,Mapper.Context context) throws IOException, InterruptedException { for(Cell cell :value.rawCells()){ context.write(new ImmutableBytesWritable("A".getBytes()),new Text("max")); } }} public static void main(String[] args) throws Exception { org.apache.hadoop.conf.Configuration hbaseConfiguration = HBaseConfiguration.create(); hbaseConfiguration.set("hbase.zookeeper.property.clientPort", zkPort); hbaseConfiguration.set("hbase.zookeeper.quorum", zkMaster); hbaseConfiguration.set("zookeeper.znode.parent", zkPath); hbaseConfiguration.setLong("hbase.client.scanner.timeout.period",600); hbaseConfiguration.setLong("hbase.rpc.timeout",600); Job job = Job.getInstance(hbaseConfiguration); job.setJarByClass(App.class); List list = new ArrayList(); Scan scan = new Scan(); scan.addFamily(Bytes.toBytes(familyName)); scan.setCaching(5);//if Caching is too big, some rowkeys will lost! for (String[] cell : cellNames) { String column = cell[0]; scan.addColumn(familyName,Bytes.toBytes(column)); } scan.setStartRow(Bytes.toBytes(startRowkeyStr)); scan.setStopRow(Bytes.toBytes(endRowkeyStr)); scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, Bytes.toBytes(hbaseTableName)); list.add(scan); System.out.println("size: "+list.size()); TableMapReduceUtil.initTableMapperJob(list,HbaseMapper.class,ImmutableBytesWritable.class,Text.class, job); job.setMapOutputKeyClass(ImmutableBytesWritable.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Text.class); FileOutputFormat.setOutputPath(job, new Path("maxTestOutput")); System.exit(job.waitForCompletion(true) ? 0 : 1); }{code} The pom.xml for mapreduce code is like this: [^pom.xml] -- This message was sent by Atlassian Jira (v8.3.4#803005)