[jira] [Commented] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond
[ https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769397#comment-15769397 ] Jingcheng Du commented on HBASE-16981: -- Thanks a lot Huaxiang! I will. > Expand Mob Compaction Partition policy from daily to weekly, monthly and > beyond > --- > > Key: HBASE-16981 > URL: https://issues.apache.org/jira/browse/HBASE-16981 > Project: HBase > Issue Type: New Feature > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-16981.master.001.patch, > HBASE-16981.master.002.patch, HBASE-16981.master.003.patch, > Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf > > > Today the mob region holds all mob files for all regions. With daily > partition mob compaction policy, after major mob compaction, there is still > one file per region daily. Given there is 365 days in one year, at least 365 > files per region. Since HDFS has limitation for number of files under one > folder, this is not going to scale if there are lots of regions. To reduce > mob file number, we want to introduce other partition policies such as > weekly, monthly to compact mob files within one week or month into one file. > This jira is create to track this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phil Yang updated HBASE-17314: -- Attachment: HBASE-17314.v05.patch Let's run a pre-commit test. Will push this patch if nothing wrong. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch, > HBASE-17314.v05.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phil Yang updated HBASE-17314: -- Status: Patch Available (was: Reopened) > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch, > HBASE-17314.v05.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17358) Unify backoff calculation
Duo Zhang created HBASE-17358: - Summary: Unify backoff calculation Key: HBASE-17358 URL: https://issues.apache.org/jira/browse/HBASE-17358 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang For Async table the sleep pause is only determined by the retry number, at least we should also take care of the exception(MultiActionResultTooLarge, CallQueueTooBig...). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17149) Procedure v2 - Fix nonce submission
[ https://issues.apache.org/jira/browse/HBASE-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769312#comment-15769312 ] stack commented on HBASE-17149: --- Giving up on backport. Differences are too extreme. You might have better luck when you get back [~syuanjiang] > Procedure v2 - Fix nonce submission > --- > > Key: HBASE-17149 > URL: https://issues.apache.org/jira/browse/HBASE-17149 > Project: HBase > Issue Type: Sub-task > Components: proc-v2 >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 1.2.4 >Reporter: Matteo Bertozzi >Assignee: Matteo Bertozzi > Fix For: 2.0.0 > > Attachments: HBASE-17149.master.001.patch, > HBASE-17149.master.002.patch, HBASE-17149.master.002.patch, > HBASE-17149.master.002.patch, HBASE-17149.master.003.patch, nonce.patch > > > instead of having all the logic in submitProcedure(), split in > registerNonce() + submitProcedure(). > In this case we can avoid calling the coprocessor twice and having a clean > submit logic knowing that there will only be one submission. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769305#comment-15769305 ] Phil Yang commented on HBASE-17314: --- {quote} This looks like it could be package private rather than public: 2344public ReplicationSourceService getReplicationSourceService() { {quote} It is used in org.apache.hadoop.hbase.replication.regionserver.TestGlobalThrottler so it can only be public. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-17262: - Resolution: Fixed Status: Resolved (was: Patch Available) > Refactor RpcServer so as to make it extendable and/or pluggable > --- > > Key: HBASE-17262 > URL: https://issues.apache.org/jira/browse/HBASE-17262 > Project: HBase > Issue Type: Sub-task > Components: rpc >Affects Versions: 2.0.0 >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0 > > Attachments: HBASE-17262.master.V1.patch, > HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, > HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell
[ https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769298#comment-15769298 ] ChiaPing Tsai commented on HBASE-17160: --- It works for me. Thanks a lot. > Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to > shell > - > > Key: HBASE-17160 > URL: https://issues.apache.org/jira/browse/HBASE-17160 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17160.addendum.txt, HBASE-17160.master.001.patch, > HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, > minor_hbase.png, untangled_hbase.png > > > Very minor untangling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769283#comment-15769283 ] binlijin commented on HBASE-17262: -- The UT failure is unrelated to this patch. > Refactor RpcServer so as to make it extendable and/or pluggable > --- > > Key: HBASE-17262 > URL: https://issues.apache.org/jira/browse/HBASE-17262 > Project: HBase > Issue Type: Sub-task > Components: rpc >Affects Versions: 2.0.0 >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0 > > Attachments: HBASE-17262.master.V1.patch, > HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, > HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769286#comment-15769286 ] stack commented on HBASE-17314: --- Go ahead and push patch w/ fix I'd say [~yangzhe1991] when you have one. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769287#comment-15769287 ] binlijin commented on HBASE-17262: -- Push to master > Refactor RpcServer so as to make it extendable and/or pluggable > --- > > Key: HBASE-17262 > URL: https://issues.apache.org/jira/browse/HBASE-17262 > Project: HBase > Issue Type: Sub-task > Components: rpc >Affects Versions: 2.0.0 >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0 > > Attachments: HBASE-17262.master.V1.patch, > HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, > HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17334) Add locate row before/after support for AsyncRegionLocator
[ https://issues.apache.org/jira/browse/HBASE-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-17334: -- Attachment: HBASE-17334-v2.patch Add test for locate after. > Add locate row before/after support for AsyncRegionLocator > -- > > Key: HBASE-17334 > URL: https://issues.apache.org/jira/browse/HBASE-17334 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17334-v1.patch, HBASE-17334-v2.patch, > HBASE-17334.patch > > > Now we only have a getPreviousRegionLocation method which is only used for > reverse scan, and it is not perfect as it can not deal with region merge. As > we want to add inclusive/exclusive support for start row and end row of a > scan, we need to implement general locate to row before/after method for > AsyncRegionLocator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769280#comment-15769280 ] Phil Yang commented on HBASE-17314: --- Thank you. The reason is after HBASE-11392 adding peer is through master so we must start a cluster first then add a peer. Before this it is ok to add peer first because it only add a znode on ZK. {code} @@ -94,12 +94,13 @@ public class TestGlobalThrottler { ReplicationAdmin admin1 = new ReplicationAdmin(conf1); ReplicationPeerConfig rpc = new ReplicationPeerConfig(); rpc.setClusterKey(utility2.getClusterKey()); -admin1.addPeer("peer1", rpc, null); -admin1.addPeer("peer2", rpc, null); -admin1.addPeer("peer3", rpc, null); utility1.startMiniCluster(1, 1); utility2.startMiniCluster(1, 1); + +admin1.addPeer("peer1", rpc, null); +admin1.addPeer("peer2", rpc, null); +admin1.addPeer("peer3", rpc, null); } {code} Will upload a new patch with your suggests. Thanks. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769235#comment-15769235 ] stack commented on HBASE-17314: --- [~yangzhe1991] Here boss https://builds.apache.org/view/H-L/view/HBase/job/PreCommit-HBASE-Build/5013/testReport/org.apache.hadoop.hbase.replication.regionserver/TestGlobalThrottler/org_apache_hadoop_hbase_replication_regionserver_TestGlobalThrottler/ > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16859) Use Bytebuffer pool for non java clients specifically for scans/gets
[ https://issues.apache.org/jira/browse/HBASE-16859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769229#comment-15769229 ] stack commented on HBASE-16859: --- Go for it [~ram_krish] (I have no numbers on native vs non-native clients) > Use Bytebuffer pool for non java clients specifically for scans/gets > > > Key: HBASE-16859 > URL: https://issues.apache.org/jira/browse/HBASE-16859 > Project: HBase > Issue Type: Sub-task >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-16859_V1.patch, HBASE-16859_V2.patch, > HBASE-16859_V2.patch, HBASE-16859_V4.patch, HBASE-16859_V5.patch, > HBASE-16859_V6.patch > > > In case of non java clients we still write the results and header into a on > demand byte[]. This can be changed to use the BBPool (onheap or offheap > buffer?). > But the basic problem is to identify if the response is for scans/gets. > - One easy way to do it is use the MethodDescriptor per Call and use the > name of the MethodDescriptor to identify it is a scan/get. But this will > pollute RpcServer by checking for scan/get type response. > - Other way is always set the result to cellScanner but we know that > isClientCellBlockSupported is going to false for non PB clients. So ignore > the cellscanner and go ahead with the results in PB. But this is not clean > - third one is that we already have a RpccallContext being passed to the RS. > In case of scan/gets/multiGets we already set a Rpccallback for shipped call. > So here on response we can check if the callback is not null and check for > isclientBlockSupported. In this case we can get the BB from the pool and > write the result and header to that BB. May be this looks clean? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769227#comment-15769227 ] Hadoop QA commented on HBASE-17262: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 34s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 0s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 31s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 58s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 50s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 48s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 33s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 95m 46s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 32s {color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 139m 39s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844345/HBASE-17262.master.V5.patch | | JIRA Issue | HBASE-17262 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 56ca5999b2df 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / d787155 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/5019/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/5019/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769210#comment-15769210 ] Phil Yang commented on HBASE-17314: --- bq. TestGlobalThrottler hangs in master build. Which build did it hang in? Any logs for the test? Thanks. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769199#comment-15769199 ] Yu Li commented on HBASE-17345: --- Overall LGTM, some comments: About {{ConnectionUtils}}: In {{voidBatch}} and {{voidBatchAll}}, mind explain why {{table. batch(actions)}} rather than {{table. batch(actions)}}? About {{AsyncTableBase}}: 1. Add javadoc for newly added methods: exists(List)/existsAll, put(List)/putAll, delete(List)/deleteAll, batch(List)/batchAll? 2. Add more UT cases to cover them? About {{TestAsyncGetMultiThread}}: 1. Now it's making chaos for each split key, including split-and-compact, balance and move, and sleep 5 seconds in between each, which will make the test run for over 2 min. Maybe simplify it a little bit to make the test finish faster? 2. Also feel the name confusing, maybe TestAsyncGetWithMultiThread is better? Thanks. > Implement batch > --- > > Key: HBASE-17345 > URL: https://issues.apache.org/jira/browse/HBASE-17345 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17345.patch > > > Add the support for general batch based on the code introduced in HBASE-17142. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17355) Create a simplifed version of flush scanner
[ https://issues.apache.org/jira/browse/HBASE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769194#comment-15769194 ] stack commented on HBASE-17355: --- I like the reduce overhead by 50% story > Create a simplifed version of flush scanner > --- > > Key: HBASE-17355 > URL: https://issues.apache.org/jira/browse/HBASE-17355 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-17354.patch, after patch.png, before patch.png > > > Currently we use StoreScanner for performing the flushes which actuallly goes > row by row. Probably that is not needed and we could always go ahead with a > simple loop in collecting the cells and writing them to the file. Inside > write path we have the required sanity check so it is not needed that the > store scanner does a sanity check. > Also the limit that could be retrieved in one next() call could be equivalent > to the block size configured as we do for compaction. > Are there any filters that we want to do (i mean any version check or > deletion) that we need to check in flush? If so then this simplified version > will not work. I may be missing something but if so we need to see what are > those and add it here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell
[ https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-17160: -- Attachment: HBASE-17160.addendum.txt This worked for me [~chia7712]. Does it work for you? If so, I'll commit. Thanks. > Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to > shell > - > > Key: HBASE-17160 > URL: https://issues.apache.org/jira/browse/HBASE-17160 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17160.addendum.txt, HBASE-17160.master.001.patch, > HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, > minor_hbase.png, untangled_hbase.png > > > Very minor untangling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17160) Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to shell
[ https://issues.apache.org/jira/browse/HBASE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769077#comment-15769077 ] ChiaPing Tsai commented on HBASE-17160: --- I run the "mvn clean test -Dtest=org.apache.hadoop.hbase.client.TestRpcControllerFactory -X” for checking the classpath, and then i find it includes the hbase-hadoop-compat/target/classes but no hbase-hadoop-compat/target/test_classes. The reasons for the lack could be that the transitive dependencies doesn’t include the test scope automatically. > Undo unnecessary inter-module dependency; spark to hbase-it and hbase-it to > shell > - > > Key: HBASE-17160 > URL: https://issues.apache.org/jira/browse/HBASE-17160 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: stack >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17160.master.001.patch, > HBASE-17160.master.002.patch, HBASE-17160.master.002.patch, hbase.png, > minor_hbase.png, untangled_hbase.png > > > Very minor untangling. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17355) Create a simplifed version of flush scanner
[ https://issues.apache.org/jira/browse/HBASE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769018#comment-15769018 ] ramkrishna.s.vasudevan commented on HBASE-17355: I don't think it is simple like in patch. May need some more tweaks but yes we can reduce the number of comparisons and reduce the atleast 50% of the overhead here. > Create a simplifed version of flush scanner > --- > > Key: HBASE-17355 > URL: https://issues.apache.org/jira/browse/HBASE-17355 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-17354.patch, after patch.png, before patch.png > > > Currently we use StoreScanner for performing the flushes which actuallly goes > row by row. Probably that is not needed and we could always go ahead with a > simple loop in collecting the cells and writing them to the file. Inside > write path we have the required sanity check so it is not needed that the > store scanner does a sanity check. > Also the limit that could be retrieved in one next() call could be equivalent > to the block size configured as we do for compaction. > Are there any filters that we want to do (i mean any version check or > deletion) that we need to check in flush? If so then this simplified version > will not work. I may be missing something but if so we need to see what are > those and add it here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17101) FavoredNodes should not apply to system tables
[ https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15769016#comment-15769016 ] Hadoop QA commented on HBASE-17101: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 9s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 48s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 41s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 43s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 14s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 49s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 89m 11s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 15s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 126m 0s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844340/HBASE-17101.master.003.patch | | JIRA Issue | HBASE-17101 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 76537c1822ac 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / d787155 | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/5018/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/5018/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > FavoredNodes should not apply to system tables > -- > > Key: HBASE-17101 > URL: https://issues.apache.org/jira/browse/HBASE-17101 > Project: HBase > Issue Type: Sub-task >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0 > > Attachments: HBASE-17101.master.001.patch, > HBASE-17101.master.002.
[jira] [Commented] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond
[ https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768994#comment-15768994 ] Hadoop QA commented on HBASE-16981: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:blue}0{color} | {color:blue} rubocop {color} | {color:blue} 0m 0s {color} | {color:blue} rubocop was not available. {color} | | {color:blue}0{color} | {color:blue} ruby-lint {color} | {color:blue} 0m 0s {color} | {color:blue} Ruby-lint was not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 34s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 11s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 15s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 44s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 11s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 6s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 25s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 13s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 12s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 47s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 30m 17s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 13s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} javadoc {color} | {color:red} 0m 37s {color} | {color:red} hbase-server generated 1 new + 1 unchanged - 0 fixed = 2 total (was 1) {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 8s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 97m 22s {color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 7m 33s {color} | {color:green} hbase-shell in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 39s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 158m 34s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844330/HBASE-16981.master.003.patch | | JIRA Issue | HBASE-16981 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile rubocop ruby_lint | | uname | Linux 13a0b7784fe1 3.13.0-93-generic #140-Ubuntu SMP Mon Jul 18 21:21:05 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component
[jira] [Updated] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-17262: - Component/s: (was: Performance) > Refactor RpcServer so as to make it extendable and/or pluggable > --- > > Key: HBASE-17262 > URL: https://issues.apache.org/jira/browse/HBASE-17262 > Project: HBase > Issue Type: Sub-task > Components: rpc >Affects Versions: 2.0.0 >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0 > > Attachments: HBASE-17262.master.V1.patch, > HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, > HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-17262: - Attachment: HBASE-17262.master.V5.patch > Refactor RpcServer so as to make it extendable and/or pluggable > --- > > Key: HBASE-17262 > URL: https://issues.apache.org/jira/browse/HBASE-17262 > Project: HBase > Issue Type: Sub-task > Components: Performance, rpc >Affects Versions: 2.0.0 >Reporter: binlijin >Assignee: binlijin > Fix For: 2.0.0 > > Attachments: HBASE-17262.master.V1.patch, > HBASE-17262.master.V2.patch, HBASE-17262.master.V3.patch, > HBASE-17262.master.V4.patch, HBASE-17262.master.V5.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17101) FavoredNodes should not apply to system tables
[ https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768796#comment-15768796 ] Thiruvel Thirumoolan commented on HBASE-17101: -- Updated patch with review comments from reviewboard addressed. > FavoredNodes should not apply to system tables > -- > > Key: HBASE-17101 > URL: https://issues.apache.org/jira/browse/HBASE-17101 > Project: HBase > Issue Type: Sub-task >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0 > > Attachments: HBASE-17101.master.001.patch, > HBASE-17101.master.002.patch, HBASE-17101.master.003.patch, > HBASE_17101_rough_draft.patch > > > As described in the doc (see HBASE-15531), we would like to start with user > tables for favored nodes. This task ensures FN does not apply to system > tables. > System tables are in memory and won't benefit from favored nodes. Since we > also maintain FN information for user regions in meta, it helps to keep > implementation simpler by ignoring system tables for the first iterations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17101) FavoredNodes should not apply to system tables
[ https://issues.apache.org/jira/browse/HBASE-17101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thiruvel Thirumoolan updated HBASE-17101: - Attachment: HBASE-17101.master.003.patch > FavoredNodes should not apply to system tables > -- > > Key: HBASE-17101 > URL: https://issues.apache.org/jira/browse/HBASE-17101 > Project: HBase > Issue Type: Sub-task >Reporter: Thiruvel Thirumoolan >Assignee: Thiruvel Thirumoolan > Fix For: 2.0.0 > > Attachments: HBASE-17101.master.001.patch, > HBASE-17101.master.002.patch, HBASE-17101.master.003.patch, > HBASE_17101_rough_draft.patch > > > As described in the doc (see HBASE-15531), we would like to start with user > tables for favored nodes. This task ensures FN does not apply to system > tables. > System tables are in memory and won't benefit from favored nodes. Since we > also maintain FN information for user regions in meta, it helps to keep > implementation simpler by ignoring system tables for the first iterations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768767#comment-15768767 ] Hudson commented on HBASE-17341: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK7 #82 (See [https://builds.apache.org/job/HBase-1.2-JDK7/82/]) HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 18dc7386bc9adff834db851a38306989fb3fd4a6) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17257) Add column-aliasing capability to hbase-client
[ https://issues.apache.org/jira/browse/HBASE-17257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768757#comment-15768757 ] Daniel Vimont commented on HBASE-17257: --- Patch v5 now available for perusal on Review Board: https://reviews.apache.org/r/54635/ > Add column-aliasing capability to hbase-client > -- > > Key: HBASE-17257 > URL: https://issues.apache.org/jira/browse/HBASE-17257 > Project: HBase > Issue Type: New Feature > Components: Client >Affects Versions: 2.0.0 >Reporter: Daniel Vimont >Assignee: Daniel Vimont > Labels: features > Attachments: HBASE-17257-v2.patch, HBASE-17257-v3.patch, > HBASE-17257-v4.patch, HBASE-17257-v5.patch, HBASE-17257.patch > > > Review Board link: https://reviews.apache.org/r/54635/ > Column aliasing will provide the option for a 1, 2, or 4 byte alias value to > be stored in each cell of an "alias enabled" column-family, in place of the > full-length column-qualifier. Aliasing is intended to operate completely > invisibly to the end-user developer, with absolutely no "awareness" of > aliasing required to be coded into a front-end application. No new public > hbase-client interfaces are to be introduced, and only a few new public > methods should need to be added to existing interfaces, primarily to allow an > administrator to designate that a new column-family is to be alias-enabled by > setting its aliasSize attribute to 1, 2, or 4. > To facilitate such functionality, new subclasses of HTable, > BufferedMutatorImpl, and HTableMultiplexer are to be provided. The overriding > methods of these new subclasses will invoke methods of the new AliasManager > class to facilitate qualifier-to-alias conversions (for user-submitted Gets, > Scans, and Mutations) and alias-to-qualifier conversions (for Results > returned from HBase) for any Table that has one or more alias-enabled column > families. All conversion logic will be encapsulated in the new AliasManager > class, and all qualifier-to-alias mappings will be persisted in a new > aliasMappingTable in a new, reserved namespace. > An informal polling of HBase users at HBaseCon East and at the > Strata/Hadoop-World conference in Sept. 2016 showed that Column Aliasing > could be a popular enhancement to standard HBase functionality, due to the > fact that full column-qualifiers are stored in each cell, and reducing this > qualifier storage requirement down to 1, 2, or 4 bytes per cell could prove > beneficial in terms of reduced storage and bandwidth needs. Aliasing is > intended chiefly for column-families which are of the "narrow and tall" > variety (i.e., that are designed to use relatively few distinct > column-qualifiers throughout a large number of rows, throughout the lifespan > of the column-family). A column-family that is set up with an alias-size of 1 > byte can contain up to 255 unique column-qualifiers; a 2 byte alias-size > allows for up to 65,535 unique column-qualifiers; and a 4 byte alias-size > allows for up to 4,294,967,295 unique column-qualifiers. > Fuller specifications will be entered into the comments section below. Note > that it may well not be viable to add aliasing support in the new "async" > classes that appear to be currently under development. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15130) Backport 0.98 Scan different TimeRange for each column family
[ https://issues.apache.org/jira/browse/HBASE-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-15130: --- Fix Version/s: (was: 0.98.24) > Backport 0.98 Scan different TimeRange for each column family > -- > > Key: HBASE-15130 > URL: https://issues.apache.org/jira/browse/HBASE-15130 > Project: HBase > Issue Type: Bug > Components: Client, regionserver, Scanners >Affects Versions: 0.98.17 >Reporter: churro morales >Assignee: churro morales > Attachments: HBASE-15130-0.98.patch, HBASE-15130-0.98.v1.patch, > HBASE-15130-0.98.v1.patch, HBASE-15130-0.98.v2.patch, > HBASE-15130-0.98.v3.patch, HBASE-15130-0.98.v4.patch > > > branch 98 version backport for HBASE-14355 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16663) JMX ConnectorServer stopped when unauthorized user try to stop HM/RS/cluster
[ https://issues.apache.org/jira/browse/HBASE-16663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-16663: --- Resolution: Fixed Status: Resolved (was: Patch Available) > JMX ConnectorServer stopped when unauthorized user try to stop HM/RS/cluster > > > Key: HBASE-16663 > URL: https://issues.apache.org/jira/browse/HBASE-16663 > Project: HBase > Issue Type: Bug > Components: metrics, security >Reporter: Pankaj Kumar >Assignee: Pankaj Kumar >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.1.9, 0.98.24, 1.2.4 > > Attachments: 16663-branch-1.1.00.patch, 16663.branch-1.1.patch, > 16663.branch-1.1.patch, HBASE-16663-0.98-V4.patch, HBASE-16663-0.98.patch, > HBASE-16663-V2.patch, HBASE-16663-V3.patch, HBASE-16663-V4.patch, > HBASE-16663-branch-1.patch, HBASE-16663.patch > > > After HBASE-16284, unauthorized user will not able allowed to stop > HM/RS/cluster, but while executing "cpHost.preStopMaster()", ConnectorServer > will be stopped before AccessController validation. > hbase-site.xml, > {noformat} > > hbase.coprocessor.master.classes > > org.apache.hadoop.hbase.JMXListener,org.apache.hadoop.hbase.security.access.AccessController > > > hbase.coprocessor.regionserver.classes > > org.apache.hadoop.hbase.JMXListener,org.apache.hadoop.hbase.security.access.AccessController > > {noformat} > HBaseAdmin.stopMaster(), > {noformat} > 2016-09-20 21:12:26,796 INFO > [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16000] > hbase.JMXListener: ConnectorServer stopped! > 2016-09-20 21:13:55,380 WARN > [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16000] > security.ShellBasedUnixGroupsMapping: got exception trying to get groups for > user P72981 > ExitCodeException exitCode=1: id: P72981: No such user > 2016-09-20 21:14:00,495 ERROR > [RpcServer.FifoWFPBQ.priority.handler=19,queue=1,port=16000] > master.MasterRpcServices: Exception occurred while stopping master > org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient > permissions for user 'P72981' (global, action=ADMIN) > at > org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:546) > at > org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522) > at > org.apache.hadoop.hbase.security.access.AccessController.preStopMaster(AccessController.java:1297) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost$68.call(MasterCoprocessorHost.java:821) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1188) > at > org.apache.hadoop.hbase.master.MasterCoprocessorHost.preStopMaster(MasterCoprocessorHost.java:817) > at org.apache.hadoop.hbase.master.HMaster.stopMaster(HMaster.java:2352) > at > org.apache.hadoop.hbase.master.MasterRpcServices.stopMaster(MasterRpcServices.java:1364) > {noformat} > HBaseAdmin.stopRegionServer(rs-host-port), > {noformat} > 2016-09-20 20:59:01,234 INFO > [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] > hbase.JMXListener: ConnectorServer stopped! > 2016-09-20 20:59:01,250 WARN > [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] > security.ShellBasedUnixGroupsMapping: got exception trying to get groups for > user P72981 > ExitCodeException exitCode=1: id: P72981: No such user > 2016-09-20 20:59:01,253 WARN > [RpcServer.FifoWFPBQ.priority.handler=18,queue=0,port=16020] > regionserver.HRegionServer: The region server did not stop > org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient > permissions for user 'P72981' (global, action=ADMIN) > at > org.apache.hadoop.hbase.security.access.AccessController.requireGlobalPermission(AccessController.java:546) > at > org.apache.hadoop.hbase.security.access.AccessController.requirePermission(AccessController.java:522) > at > org.apache.hadoop.hbase.security.access.AccessController.preStopRegionServer(AccessController.java:2501) > at > org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost$1.call(RegionServerCoprocessorHost.java:84) > at > org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.execOperation(RegionServerCoprocessorHost.java:256) > at > org.apache.hadoop.hbase.regionserver.RegionServerCoprocessorHost.preStop(RegionServerCoprocessorHost.java:80) > at > org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1905) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.stopServer(RSRpcServices.java:1961) > {noformat} > HBaseAdmin.shutdown(), > {noformat} > 2016-09-21 12:09:08,259 IN
[jira] [Commented] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768749#comment-15768749 ] Duo Zhang commented on HBASE-17345: --- {quote} We take startLogErrorsCnt as a param but ignore it? {quote} It is just for debugging... Will do some cleanup in the next patch. {quote} You make a new Action from passed-in Action because you don't want to modify passed-in params? {quote} The action passed in is a Row, i.e., Get, Put, Delete etc. And we will use it to construct an Action Object. It records the originalIndex of the action and also carries the nonce. {quote} super nit: you can presize the following 148this.action2Errors = new IdentityHashMap<>(); {quote} If there is no error then the map will remain empty after we finish. I think this is the common case? {quote} This is just to log? 208long currentTime = System.currentTimeMillis(); i.e. all timing is with nanos but millis is just for logging? {quote} Just follow the old log pattern. It is use to construct the error message of RetriesExhaustedException. And I think it is reasonable as it is more friendly for the user to get a date(think of PrintGCTimeStamps VS. PrintGCDateStamps) {quote} What do you see AsyncBatchRpcRetryingCaller replacing in our current stack? It seems to do AP and a bunch of our Callable infra. Should AsyncBatchRpcRetryingCaller implement Callable? Or what you thinking? {quote} I plan to use it to replace AsyncProcess. And there is no callable in the current client implementation stack(or maybe some simple ones, see AsyncSingleRequestRpcRetryingCaller). With this patch, I think most retrying callers for async table are in place. The exceptions are read replica support(HBASE-17356), and endpoint support(HBASE-17346). And still need to improve the scan implementation(mvcc, inclusive/exclusive of start row and end row, etc.). But I think it is time to think about building the old blocking API on top of the new async API and get rid of the old code. {quote} Why we have AsyncTable and AsyncTableBase again? Do we have to have the two Interfaces? {quote} There were introduced when implementing scan. See the discussion in HBASE-16984. We can discuss later whether we can just have one AsyncTable interface. {quote} Do you have to rename TestAsyncGetMultiThread ? And/or TestAsyncTableMultiGet? {quote} No, it is get, not multi get... I will rename TestAsyncTableMultiGet and add other batch tests to it. Thanks. > Implement batch > --- > > Key: HBASE-17345 > URL: https://issues.apache.org/jira/browse/HBASE-17345 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17345.patch > > > Add the support for general batch based on the code introduced in HBASE-17142. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond
[ https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun updated HBASE-16981: - Status: Patch Available (was: Open) Hi [~jingcheng.du] and [~anoop.hbase], I posted v3 patch based on the new design. Can you help to review? Thanks! > Expand Mob Compaction Partition policy from daily to weekly, monthly and > beyond > --- > > Key: HBASE-16981 > URL: https://issues.apache.org/jira/browse/HBASE-16981 > Project: HBase > Issue Type: New Feature > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-16981.master.001.patch, > HBASE-16981.master.002.patch, HBASE-16981.master.003.patch, > Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf > > > Today the mob region holds all mob files for all regions. With daily > partition mob compaction policy, after major mob compaction, there is still > one file per region daily. Given there is 365 days in one year, at least 365 > files per region. Since HDFS has limitation for number of files under one > folder, this is not going to scale if there are lots of regions. To reduce > mob file number, we want to introduce other partition policies such as > weekly, monthly to compact mob files within one week or month into one file. > This jira is create to track this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-16981) Expand Mob Compaction Partition policy from daily to weekly, monthly and beyond
[ https://issues.apache.org/jira/browse/HBASE-16981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun updated HBASE-16981: - Attachment: HBASE-16981.master.003.patch > Expand Mob Compaction Partition policy from daily to weekly, monthly and > beyond > --- > > Key: HBASE-16981 > URL: https://issues.apache.org/jira/browse/HBASE-16981 > Project: HBase > Issue Type: New Feature > Components: mob >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-16981.master.001.patch, > HBASE-16981.master.002.patch, HBASE-16981.master.003.patch, > Supportingweeklyandmonthlymobcompactionpartitionpolicyinhbase.pdf > > > Today the mob region holds all mob files for all regions. With daily > partition mob compaction policy, after major mob compaction, there is still > one file per region daily. Given there is 365 days in one year, at least 365 > files per region. Since HDFS has limitation for number of files under one > folder, this is not going to scale if there are lots of regions. To reduce > mob file number, we want to introduce other partition policies such as > weekly, monthly to compact mob files within one week or month into one file. > This jira is create to track this effort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17018) Spooling BufferedMutator
[ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768703#comment-15768703 ] Enis Soztutar commented on HBASE-17018: --- Thanks for entertaining my suggestion. bq. On our case we launch ~1K containers per second. If we write 100 metrics each, the total volume written into HBase is considerable HBase in the end will end up writing these events to it's own WAL's on HDFS. So in terms of scalability, you should be able to achieve HBase throughput much much easier, since HBase is doing a lot more work (RPC, sorting data, flushing to disk, compaction, etc). bq. For large deployments that means that there could be hundreds of parallel writers. That should be fine for HDFS, as long as you have 1 writer per application, rather than 1 writer per task. bq. It would essentially double the hdfs requirement for the storage I was thinking that you would delete the records once the reader has persisted them to HBase. If the application writer is dead, some other application writer eventually finishes the persisting to HBase (because WALs are already there in HDFS). For example, HBase keeps rolling the WAL to a new file every ~100MB's. Then the whole file is deleted once we determine that it is not needed anymore. bq. Would the reader still query hbase only and return no data if hbase is missing the data? I think that is determined by the requirements for ATS. You have to determine the "commit point" and the read point semantics. For example, you can have it so that commit point is the HDFS write. Once it is complete, you ACK the write which means HBase write will be "eventual consitent" with the benefit of not depending on HBase availability. Or you can make it so that, the commit point is HDFS write + wait for Hbase write for 30seconds. In this case, you wait for HBase for 30 sec, but still ACK the write once it hits HDFS after timeout. It also depends on whether you need read-after-write semantics or not. If so, maybe you do a in-memory cache for stuff waiting to be written to HBase. Not sure on ATS requirements. > Spooling BufferedMutator > > > Key: HBASE-17018 > URL: https://issues.apache.org/jira/browse/HBASE-17018 > Project: HBase > Issue Type: New Feature >Reporter: Joep Rottinghuis > Attachments: HBASE-17018.master.001.patch, > HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, > HBASE-17018.master.004.patch, > HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements > for fault tolerant writer.pdf > > > For Yarn Timeline Service v2 we use HBase as a backing store. > A big concern we would like to address is what to do if HBase is > (temporarily) down, for example in case of an HBase upgrade. > Most of the high volume writes will be mostly on a best-effort basis, but > occasionally we do a flush. Mainly during application lifecycle events, > clients will call a flush on the timeline service API. In order to handle the > volume of writes we use a BufferedMutator. When flush gets called on our API, > we in turn call flush on the BufferedMutator. > We would like our interface to HBase be able to spool the mutations to a > filesystems in case of HBase errors. If we use the Hadoop filesystem > interface, this can then be HDFS, gcs, s3, or any other distributed storage. > The mutations can then later be re-played, for example through a MapReduce > job. > https://reviews.apache.org/r/54882/ > For design of SpoolingBufferedMutatorImpl see > https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768700#comment-15768700 ] Hudson commented on HBASE-17341: SUCCESS: Integrated in Jenkins build HBase-1.3-JDK8 #84 (See [https://builds.apache.org/job/HBase-1.3-JDK8/84/]) HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 16583cd4f9a3219ce710d180447547b890268bf1) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17001) [RegionServer] Implement enforcement of quota violation policies
[ https://issues.apache.org/jira/browse/HBASE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768687#comment-15768687 ] Hadoop QA commented on HBASE-17001: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 4s {color} | {color:red} HBASE-17001 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/0.3.0/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844328/HBASE-17001.003.patch | | JIRA Issue | HBASE-17001 | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/5016/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > [RegionServer] Implement enforcement of quota violation policies > > > Key: HBASE-17001 > URL: https://issues.apache.org/jira/browse/HBASE-17001 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 2.0.0 > > Attachments: HBASE-17001.001.patch, HBASE-17001.003.patch > > > When the master enacts a quota violation policy, the RegionServers need to > actually enforce that policy per its definition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768688#comment-15768688 ] Hudson commented on HBASE-17341: SUCCESS: Integrated in Jenkins build HBase-1.2-JDK8 #76 (See [https://builds.apache.org/job/HBase-1.2-JDK8/76/]) HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 18dc7386bc9adff834db851a38306989fb3fd4a6) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17001) [RegionServer] Implement enforcement of quota violation policies
[ https://issues.apache.org/jira/browse/HBASE-17001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-17001: --- Attachment: HBASE-17001.003.patch .003 This was a brutal re-write. Turns out to support the "proactive rejection" of bulk loads that would violate a quota, we need to start tracking the quota information much differently. We have to know what the current size of a table is and what it's allowed to be (the current quota limit). There was a bit of cleanup along the way that was beneficial. Overall, a good exercise at least. > [RegionServer] Implement enforcement of quota violation policies > > > Key: HBASE-17001 > URL: https://issues.apache.org/jira/browse/HBASE-17001 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 2.0.0 > > Attachments: HBASE-17001.001.patch, HBASE-17001.003.patch > > > When the master enacts a quota violation policy, the RegionServers need to > actually enforce that policy per its definition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768562#comment-15768562 ] Hudson commented on HBASE-17341: SUCCESS: Integrated in Jenkins build HBase-1.1-JDK7 #1829 (See [https://builds.apache.org/job/HBase-1.1-JDK7/1829/]) HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 1999c15a9adf774c39478d181accd6a15bdf29ff) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768515#comment-15768515 ] Hudson commented on HBASE-17341: SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #74 (See [https://builds.apache.org/job/HBase-1.3-JDK7/74/]) HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 16583cd4f9a3219ce710d180447547b890268bf1) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768487#comment-15768487 ] Hudson commented on HBASE-17341: SUCCESS: Integrated in Jenkins build HBase-1.1-JDK8 #1913 (See [https://builds.apache.org/job/HBase-1.1-JDK8/1913/]) HBASE-17341 Add a timeout during replication endpoint termination (apurtell: rev 1999c15a9adf774c39478d181accd6a15bdf29ff) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17018) Spooling BufferedMutator
[ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768468#comment-15768468 ] Sangjin Lee commented on HBASE-17018: - Your suggestion is interesting [~enis]. Thanks for the idea. In addition to what Joep mentioned above, I do worry about the capacity requirement a dual-writing system would have. It would essentially double the hdfs requirement for the storage, and at large scale it would add up to a meaningful amount. Also, how would a reader work in the case where the data made it into hdfs but not into hbase (e.g. hbase cluster was down for a while for an upgrade)? Would the reader still query hbase only and return no data if hbase is missing the data? If we want to address that situation, we're putting back the unspooling (migrating missing data from the backup location to hbase). I'm just trying to round out the idea... Thanks! > Spooling BufferedMutator > > > Key: HBASE-17018 > URL: https://issues.apache.org/jira/browse/HBASE-17018 > Project: HBase > Issue Type: New Feature >Reporter: Joep Rottinghuis > Attachments: HBASE-17018.master.001.patch, > HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, > HBASE-17018.master.004.patch, > HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements > for fault tolerant writer.pdf > > > For Yarn Timeline Service v2 we use HBase as a backing store. > A big concern we would like to address is what to do if HBase is > (temporarily) down, for example in case of an HBase upgrade. > Most of the high volume writes will be mostly on a best-effort basis, but > occasionally we do a flush. Mainly during application lifecycle events, > clients will call a flush on the timeline service API. In order to handle the > volume of writes we use a BufferedMutator. When flush gets called on our API, > we in turn call flush on the BufferedMutator. > We would like our interface to HBase be able to spool the mutations to a > filesystems in case of HBase errors. If we use the Hadoop filesystem > interface, this can then be HDFS, gcs, s3, or any other distributed storage. > The mutations can then later be re-played, for example through a MapReduce > job. > https://reviews.apache.org/r/54882/ > For design of SpoolingBufferedMutatorImpl see > https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768453#comment-15768453 ] Hudson commented on HBASE-17314: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2174 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2174/]) Revert "HBASE-17314 Limit total buffered size for all replication (stack: rev a1d2ff4646743a9136bb1182c0512bce28e358b7) * (delete) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestGlobalThrottler.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationEndpoint.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * (edit) hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers
[ https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768454#comment-15768454 ] Hudson commented on HBASE-5401: --- FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2174 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2174/]) HBASE-5401 PerformanceEvaluation generates 10x the number of expected (stack: rev d787155fd24c576b3220372dbb7286d5e291) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/TestPerformanceEvaluation.java > PerformanceEvaluation generates 10x the number of expected mappers > -- > > Key: HBASE-5401 > URL: https://issues.apache.org/jira/browse/HBASE-5401 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Oliver Meyn >Assignee: Yi Liang > Fix For: 2.0.0 > > Attachments: HBASE-5401-V1.patch > > > With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation > randomWrite 10' there are 100 mappers spawned, rather than the expected 10. > The culprit appears to be the outer loop in writeInputFile which sets up 10 > splits for every "asked-for client". I think the fix is just to remove that > outer loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16010) Put draining function through Admin API
[ https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768204#comment-15768204 ] Jerry He commented on HBASE-16010: -- Hi, [~enis] bq. I think we should have the ACL changes in this patch as well. Otherwise, it will get forgotten and leave a security hole We surely need to get the ACL in. But let's get this JIRA with the protobuf changes in first? Mixing the ACL observers and the protobuf changes will probably bloat the patch and confusing. Let me open a subtask right away and make sure it will be in. Currently, the decommission works this way (i played with it recently.) 1. Put the server in drain mode. 2 Move the regions off with the region mover. You think we should combine the two steps into one? bq. "decommissioning" a server should be integral to the new assignment manager in the sense that the core assignment should be aware of decommissioning servers. I think currently if a server is in drain mode, serverManager/assignment/balancer will skip it as candidate servers. But not sure much about the details. > Put draining function through Admin API > --- > > Key: HBASE-16010 > URL: https://issues.apache.org/jira/browse/HBASE-16010 > Project: HBase > Issue Type: Improvement >Reporter: Jerry He >Assignee: Matt Warhaftig >Priority: Minor > Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch > > > Currently, there is no Amdin API for draining function. Client has to > interact directly with Zookeeper draining node to add and remove draining > servers. > For example, in draining_servers.rb: > {code} > zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, > "draining_servers", nil) > parentZnode = zkw.drainingZNode > begin > for server in servers > node = ZKUtil.joinZNode(parentZnode, server) > ZKUtil.createAndFailSilent(zkw, node) > end > ensure > zkw.close() > end > {code} > This is not good in cases like secure clusters with protected Zookeeper nodes. > Let's put draining function through Admin API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768275#comment-15768275 ] Hudson commented on HBASE-17341: FAILURE: Integrated in Jenkins build HBase-1.4 #576 (See [https://builds.apache.org/job/HBase-1.4/576/]) HBASE-17341 Add a timeout during replication endpoint termination (tedyu: rev f94180a3e9820761d59be98a62db9d218a096e5b) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.8 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-17357) PerformanceEvaluation parameters parsing triggers NPE.
Jean-Marc Spaggiari created HBASE-17357: --- Summary: PerformanceEvaluation parameters parsing triggers NPE. Key: HBASE-17357 URL: https://issues.apache.org/jira/browse/HBASE-17357 Project: HBase Issue Type: Bug Affects Versions: 1.2.4 Reporter: Jean-Marc Spaggiari Priority: Minor When using wrong parameters, PE triggers an NPE. Should not {code} @hbasetest1:~# hbase pe --nomapred 16/12/21 16:38:50 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available java.lang.NullPointerException at java.util.TreeMap.getEntry(TreeMap.java:342) at java.util.TreeMap.get(TreeMap.java:273) at org.apache.hadoop.hbase.PerformanceEvaluation.determineCommandClass(PerformanceEvaluation.java:2145) at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:2127) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:2150) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16010) Put draining function through Admin API
[ https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768178#comment-15768178 ] Jerry He commented on HBASE-16010: -- bq. MasterRpcServices already has a ServerName import entry (org.apache.hadoop.hbase.ServerName) and they would conflict. Use HBaseProtos.ServerName > Put draining function through Admin API > --- > > Key: HBASE-16010 > URL: https://issues.apache.org/jira/browse/HBASE-16010 > Project: HBase > Issue Type: Improvement >Reporter: Jerry He >Assignee: Matt Warhaftig >Priority: Minor > Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch > > > Currently, there is no Amdin API for draining function. Client has to > interact directly with Zookeeper draining node to add and remove draining > servers. > For example, in draining_servers.rb: > {code} > zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, > "draining_servers", nil) > parentZnode = zkw.drainingZNode > begin > for server in servers > node = ZKUtil.joinZNode(parentZnode, server) > ZKUtil.createAndFailSilent(zkw, node) > end > ensure > zkw.close() > end > {code} > This is not good in cases like secure clusters with protected Zookeeper nodes. > Let's put draining function through Admin API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers
[ https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5401: - Hadoop Flags: Incompatible change,Reviewed (was: Reviewed) Marked it incompatible change. > PerformanceEvaluation generates 10x the number of expected mappers > -- > > Key: HBASE-5401 > URL: https://issues.apache.org/jira/browse/HBASE-5401 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Oliver Meyn >Assignee: Yi Liang > Fix For: 2.0.0 > > Attachments: HBASE-5401-V1.patch > > > With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation > randomWrite 10' there are 100 mappers spawned, rather than the expected 10. > The culprit appears to be the outer loop in writeInputFile which sets up 10 > splits for every "asked-for client". I think the fix is just to remove that > outer loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-5401) PerformanceEvaluation generates 10x the number of expected mappers
[ https://issues.apache.org/jira/browse/HBASE-5401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-5401: - Resolution: Fixed Hadoop Flags: Reviewed Release Note: Changes how many tasks PE runs when clients are mapreduce. Now tasks == client count. Previous we hardcoded ten tasks per client instance. Status: Resolved (was: Patch Available) Pushed. Makes sense. This baffled you and Oliver. Thats enough. Thanks for the patch [~easyliangjob] > PerformanceEvaluation generates 10x the number of expected mappers > -- > > Key: HBASE-5401 > URL: https://issues.apache.org/jira/browse/HBASE-5401 > Project: HBase > Issue Type: Bug > Components: test >Affects Versions: 2.0.0 >Reporter: Oliver Meyn >Assignee: Yi Liang > Fix For: 2.0.0 > > Attachments: HBASE-5401-V1.patch > > > With a command line like 'hbase org.apache.hadoop.hbase.PerformanceEvaluation > randomWrite 10' there are 100 mappers spawned, rather than the expected 10. > The culprit appears to be the outer loop in writeInputFile which sets up 10 > splits for every "asked-for client". I think the fix is just to remove that > outer loop. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Purtell updated HBASE-17341: --- Resolution: Fixed Fix Version/s: 0.98.24 1.1.8 1.2.5 1.3.0 Status: Resolved (was: Patch Available) > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 1.1.8, 0.98.24 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767960#comment-15767960 ] Hudson commented on HBASE-17341: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2173 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2173/]) HBASE-17341 Add a timeout during replication endpoint termination (tedyu: rev cac0904c16dde9eb7bdbb57e4a33224dd4edb77f) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17352) Fix hbase-assembly build with bash 4
[ https://issues.apache.org/jira/browse/HBASE-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767961#comment-15767961 ] Hudson commented on HBASE-17352: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2173 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2173/]) HBASE-17352 Fix hbase-assembly build with bash 4 (Junegunn Choi) (tedyu: rev acd0218d91bac9410f7b9bc68f66aa065fd47d55) * (edit) hbase-assembly/pom.xml > Fix hbase-assembly build with bash 4 > > > Key: HBASE-17352 > URL: https://issues.apache.org/jira/browse/HBASE-17352 > Project: HBase > Issue Type: Bug >Reporter: Junegunn Choi >Assignee: Junegunn Choi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17352.patch > > > hbase-assembly fails to build with bash 4. > {noformat} > [DEBUG] Executing command line: [env, bash, -c, cat > maven-shared-archive-resources/META-INF/NOTICE \ > `find > /Users/jg/github/hbase/hbase-assembly/target/dependency -iname NOTICE -or > -iname NOTICE.txt` \] > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec > (concat-NOTICE-files) on project hbase-assembly: Command execution failed. > Process exited with an error: 1 (Exit value: 1) -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (concat-NOTICE-files) on > project hbase-assembly: Command execution failed. > {noformat} > The error is caused by the trailing backslash in the bash command for > {{concat-NOTICE-files}}. You can see the behavioral difference between bash 3 > and 4 with the following snippet. > {code} > $ # Using bash 3 > $ /bin/bash -c 'cat <(echo foo) \' && echo good || echo bad > foo > good > $ # Using bash 4 > $ /usr/local/bin/bash -c 'cat <(echo foo) \' && echo good || echo bad > foo > cat: \: No such file or directory > bad > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17355) Create a simplifed version of flush scanner
[ https://issues.apache.org/jira/browse/HBASE-17355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767948#comment-15767948 ] stack commented on HBASE-17355: --- Nice experiment. What diff you see [~ram_krish]? (How you reading the profiling?) > Create a simplifed version of flush scanner > --- > > Key: HBASE-17355 > URL: https://issues.apache.org/jira/browse/HBASE-17355 > Project: HBase > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0 > > Attachments: HBASE-17354.patch, after patch.png, before patch.png > > > Currently we use StoreScanner for performing the flushes which actuallly goes > row by row. Probably that is not needed and we could always go ahead with a > simple loop in collecting the cells and writing them to the file. Inside > write path we have the required sanity check so it is not needed that the > store scanner does a sanity check. > Also the limit that could be retrieved in one next() call could be equivalent > to the block size configured as we do for compaction. > Are there any filters that we want to do (i mean any version check or > deletion) that we need to check in flush? If so then this simplified version > will not work. I may be missing something but if so we need to see what are > those and add it here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17338) Treat Cell data size under global memstore heap size only when that Cell can not be copied to MSLAB
[ https://issues.apache.org/jira/browse/HBASE-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767923#comment-15767923 ] stack commented on HBASE-17338: --- Do we have to check if on or offheap MSLAB? 86/** 87 * @return Whether off heap based MSLAB in place. 88 */ 89boolean isOffheap(); Can we not have MSLAB work same whether on or offheap? This sort of check... 101 // issues or even OOME. 102 if (this.memStoreLAB != null && this.memStoreLAB.isOffheap()) { 103 heapOverheadDelta += cellLen; 104 } ... presumes that MSLAB is done in either of two ways. This check is done apart from the implementation. Is there copy/paste of code (going by your dup'ing the comment?). I need to read on why Append/Increment can't be out in offheap. This is good stuff though [~anoop.hbase] > Treat Cell data size under global memstore heap size only when that Cell can > not be copied to MSLAB > --- > > Key: HBASE-17338 > URL: https://issues.apache.org/jira/browse/HBASE-17338 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 2.0.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-17338.patch > > > We have only data size and heap overhead being tracked globally. Off heap > memstore works with off heap backed MSLAB pool. But a cell, when added to > memstore, not always getting copied to MSLAB. Append/Increment ops doing an > upsert, dont use MSLAB. Also based on the Cell size, we sometimes avoid > MSLAB copy. But now we track these cell data size also under the global > memstore data size which indicated off heap size in case of off heap > memstore. For global checks for flushes (against lower/upper watermark > levels), we check this size against max off heap memstore size. We do check > heap overhead against global heap memstore size (Defaults to 40% of xmx) But > for such cells the data size also should be accounted under the heap overhead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16010) Put draining function through Admin API
[ https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767918#comment-15767918 ] Enis Soztutar commented on HBASE-16010: --- I think we should have the ACL changes in this patch as well. Otherwise, it will get forgotten and leave a security hole. Instead of draining, we should use the term decommission/recommission I think. And GetDrainingServers should be ListDrainingServers, or ListDecommissionedServers. This is obviously already broken, but the new API right now only puts the server in "draining" mode and does not do anything else. Is there a plan to bring the actual functionality (of moving regions out of the RS) in the master as well? As I have noted elsewhere, "decommissioning" a server should be integral to the new assignment manager in the sense that the core assignment should be aware of decommissioning servers. [~stack], [~syuanjiang] what do you guys think? Is the current stuff have ways to address that? > Put draining function through Admin API > --- > > Key: HBASE-16010 > URL: https://issues.apache.org/jira/browse/HBASE-16010 > Project: HBase > Issue Type: Improvement >Reporter: Jerry He >Assignee: Matt Warhaftig >Priority: Minor > Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch > > > Currently, there is no Amdin API for draining function. Client has to > interact directly with Zookeeper draining node to add and remove draining > servers. > For example, in draining_servers.rb: > {code} > zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, > "draining_servers", nil) > parentZnode = zkw.drainingZNode > begin > for server in servers > node = ZKUtil.joinZNode(parentZnode, server) > ZKUtil.createAndFailSilent(zkw, node) > end > ensure > zkw.close() > end > {code} > This is not good in cases like secure clusters with protected Zookeeper nodes. > Let's put draining function through Admin API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767906#comment-15767906 ] Andrew Purtell commented on HBASE-17341: This is flagged as a critical fix and we hit it in production, so I'm going to commit everywhere. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767905#comment-15767905 ] stack commented on HBASE-17314: --- Reverted. Reopened. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack reopened HBASE-17314: --- Let me revert. The failing test is messing up other devs. Can reapply w/ addendum no problem. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767877#comment-15767877 ] stack commented on HBASE-17341: --- Thanks for clarification [~vincentpoon]. If no dataloss and we retry, WARN seems fine by me. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767791#comment-15767791 ] Vincent Poon commented on HBASE-17341: -- [~stack] I don't believe we'll lose data if you timeout even during shipping. source#shipEdits() doesn't remove entries from the queue until endpoint#replicate() returns success. So at worst, you ship the data more than once. Looking at it now I suppose ERROR would make sense, though WARN is no worse than what was there before. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17334) Add locate row before/after support for AsyncRegionLocator
[ https://issues.apache.org/jira/browse/HBASE-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767773#comment-15767773 ] stack commented on HBASE-17334: --- Skimmed. LGTM. I like the enum for what type of location. Below are nits that can be addressed on commit or in next patch... It just gets a bit confusing when you add a shortcut for 'before' this.before = before; ... rather than do test of the enum if == BEFORE. nit: do a switch in stead of if/else: if (type == RegionLocateType.BEFORE) { RegionLocateType is a nice improvement. > Add locate row before/after support for AsyncRegionLocator > -- > > Key: HBASE-17334 > URL: https://issues.apache.org/jira/browse/HBASE-17334 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17334-v1.patch, HBASE-17334.patch > > > Now we only have a getPreviousRegionLocation method which is only used for > reverse scan, and it is not perfect as it can not deal with region merge. As > we want to add inclusive/exclusive support for start row and end row of a > scan, we need to implement general locate to row before/after method for > AsyncRegionLocator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15130) Backport 0.98 Scan different TimeRange for each column family
[ https://issues.apache.org/jira/browse/HBASE-15130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] churro morales updated HBASE-15130: --- Resolution: Won't Fix Status: Resolved (was: Patch Available) > Backport 0.98 Scan different TimeRange for each column family > -- > > Key: HBASE-15130 > URL: https://issues.apache.org/jira/browse/HBASE-15130 > Project: HBase > Issue Type: Bug > Components: Client, regionserver, Scanners >Affects Versions: 0.98.17 >Reporter: churro morales >Assignee: churro morales > Fix For: 0.98.24 > > Attachments: HBASE-15130-0.98.patch, HBASE-15130-0.98.v1.patch, > HBASE-15130-0.98.v1.patch, HBASE-15130-0.98.v2.patch, > HBASE-15130-0.98.v3.patch, HBASE-15130-0.98.v4.patch > > > branch 98 version backport for HBASE-14355 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767757#comment-15767757 ] Andrew Purtell commented on HBASE-17341: We can do an addendum if warranted. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767751#comment-15767751 ] Andrew Purtell commented on HBASE-17069: Alright, let me run head of branch-1.2 and see if it repros with TRACE level logging. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767745#comment-15767745 ] stack commented on HBASE-17069: --- Either. Need to dig in on this issue. Need to make a start somewhere. I don't mind doing the digging if you are doing the running of the test. [~apurtell] > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=128
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767740#comment-15767740 ] Andrew Purtell commented on HBASE-17069: bq. They are for 1.3 run? You want all logs from a 1.3 run? I can redo Or 1.2. Made to order. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767736#comment-15767736 ] stack commented on HBASE-17341: --- [~vincentpoon] See above. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767734#comment-15767734 ] Andrew Purtell commented on HBASE-17069: bq. I've not magic other than logging and asserts I can run binaries for you. bq. How long to repro? Usually fails within a few hours, sometimes needs an overnight. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, free
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767730#comment-15767730 ] stack commented on HBASE-17069: --- You mean logs from november 10th [~apurtell]? They are for 1.3 run? I've not magic other than logging and assertsunfortunately. How long to repro? > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712
[jira] [Comment Edited] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767723#comment-15767723 ] Andrew Purtell edited comment on HBASE-17341 at 12/21/16 6:13 PM: -- Since Ted committed this I will pick to 0.98 now. I missed it if there was an announcement that branch-1.3 is closed. I committed another of Vincent's replication fixes there yesterday. We should probably commit this one too now that the deed has been done. was (Author: apurtell): Since Ted committed this I will pick to 0.98 now and resolve. I missed it if there was an announcement that branch-1.3 is closed. I committed another of Vincent's replication fixes there yesterday. We should probably commit this one too now that the deed has been done. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767723#comment-15767723 ] Andrew Purtell commented on HBASE-17341: Since Ted committed this I will pick to 0.98 now and resolve. I missed it if there was an announcement that branch-1.3 is closed. I committed another of Vincent's replication fixes there yesterday. We should probably commit this one too now that the deed has been done. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767711#comment-15767711 ] Andrew Purtell commented on HBASE-17069: I attached logs from one failed run with 1.2 on this issue. They indicate the problem but not the cause. (Or maybe I missed it.) I plan to look over the HBASE-14465 diff closely, and read the affected code in place, and probably introduce more logging temporarily in suspect places. Other suggestions? > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddf
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767699#comment-15767699 ] stack commented on HBASE-17069: --- [~apurtell] How we debug? Logs to look at or something? > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFa
[jira] [Comment Edited] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767531#comment-15767531 ] Andrew Purtell edited comment on HBASE-17069 at 12/21/16 5:59 PM: -- FWIW I tested the head of branch-1.3 in my rig and it failed the same way, "no serialized HRegionInfo" in some rows in meta, with resulting job failure as part of the keyspace went missing. [~mantonov] [~ghelmling] was (Author: apurtell): FWIW I tested the head of branch-1.3 in my rig and it failed the same way, "no serialized HRegionInfo" in some rows in meta, with resulting job failure as part of the keyspace went missing. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWri
[jira] [Commented] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767632#comment-15767632 ] stack commented on HBASE-17345: --- {code} 72 * And the {@link #maxAttempts} is a limit for each single operation in the batch logically. In the 73 * implementation, we will record a {@code tries} parameter for each operation group, and if it is 74 * split to several groups when retrying, the sub groups will inherit the {@code tries}. You can 75 * imagine that the whole retrying process is a tree, and the {@link #maxAttempts} is the limit of 76 * the depth of the tree. 77 */ {code} Trying to understand, the tree will only have a depth of one; i.e. a branch for each regionserver the batch is going against? Each branch can run up its own maxAttempts? The tries is not shared amongst the branches? Regardless of how many retries, the operation will stop after operationTimeoutNs? If so, sounds good. It has to be an Impl in the below, it can't be Interface? private final AsyncConnectionImpl conn; What is up w/ below? this.startLogErrorsCnt = 0;// startLogErrorsCnt; We take startLogErrorsCnt as a param but ignore it? You make a new Action from passed-in Action because you don't want to modify passed-in params? 139 Action action = new Action(rawAction, i); super nit: you can presize the following148 this.action2Errors = new IdentityHashMap<>(); Perhaps if TRACE-level logging, log every attempt: 164 if (tries > startLogErrorsCnt) { ? Is it right to set this to WARN since it might succeed on next attempt? LOG.warn("Process batch for " ... maybe I'm reading it wrong though? nit: give this method a better name: 174 private String getExtras(ServerName serverName) { 175 return serverName != null ? serverName.getServerName() : ""; 176 } YOu should use the above method here? 4 serverName != null ? serverName.toString() : "")); This is just to log? 208long currentTime = System.currentTimeMillis(); i.e. all timing is with nanos but millis is just for logging? This is a crazy amount of work! I like how this patch is getting better on each iteration; i.e. public MultiGetCallerBuilder multiGet() { becomes public BatchCallerBuilder batch() { Skimmed after reading 1/4. What do you see AsyncBatchRpcRetryingCaller replacing in our current stack? It seems to do AP and a bunch of our Callable infra. Should AsyncBatchRpcRetryingCaller implement Callable? Or what you thinking? Generally no * imports 1 import static org.apache.hadoop.hbase.client.ConnectionUtils.*; Why we have AsyncTable and AsyncTableBase again? Do we have to have the two Interfaces? Do you have to rename TestAsyncGetMultiThread ? And/or TestAsyncTableMultiGet? This is nice work > Implement batch > --- > > Key: HBASE-17345 > URL: https://issues.apache.org/jira/browse/HBASE-17345 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17345.patch > > > Add the support for general batch based on the code introduced in HBASE-17142. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17250) For Get and scan in one case, checkFamily can be skipped in Region#getScanner
[ https://issues.apache.org/jira/browse/HBASE-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun updated HBASE-17250: - Status: Patch Available (was: Open) > For Get and scan in one case, checkFamily can be skipped in Region#getScanner > - > > Key: HBASE-17250 > URL: https://issues.apache.org/jira/browse/HBASE-17250 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Attachments: HBASE-17250-master-001.patch, > HBASE-17250-master-002.patch > > > For get(), checkFamily is done in prepareGet(), so checkFamily can be skipped > in Region#getScanner(). For scan(), if there is no Family configured in scan, > the families are from table descriptor, so checkFamily in > Region#getScanner(). can be skipped in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17250) For Get and scan in one case, checkFamily can be skipped in Region#getScanner
[ https://issues.apache.org/jira/browse/HBASE-17250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] huaxiang sun updated HBASE-17250: - Attachment: HBASE-17250-master-002.patch Submit a patch. One issue with coprocessor is that it could change families in scan, to be safe, after coprocessor, recheck families in getScanner(). > For Get and scan in one case, checkFamily can be skipped in Region#getScanner > - > > Key: HBASE-17250 > URL: https://issues.apache.org/jira/browse/HBASE-17250 > Project: HBase > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: huaxiang sun >Assignee: huaxiang sun >Priority: Minor > Attachments: HBASE-17250-master-001.patch, > HBASE-17250-master-002.patch > > > For get(), checkFamily is done in prepareGet(), so checkFamily can be skipped > in Region#getScanner(). For scan(), if there is no Family configured in scan, > the families are from table descriptor, so checkFamily in > Region#getScanner(). can be skipped in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers
[ https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767606#comment-15767606 ] Hudson commented on HBASE-17328: FAILURE: Integrated in Jenkins build HBase-0.98-on-Hadoop-1.1 #1299 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1299/]) HBASE-17328 Properly dispose of looped replication peers (apurtell: rev 5ea953b0115ac814f67e4fb076b2fdce85dd22cf) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java > Properly dispose of looped replication peers > > > Key: HBASE-17328 > URL: https://issues.apache.org/jira/browse/HBASE-17328 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.4.0, 0.98.23 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9 > > Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, > HBASE-17328-master.v2.patch, HBASE-17328.0.98.v4.patch, > HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch, > HBASE-17328.branch-1.1.v4.patch, HBASE-17328.master.v4.patch > > > When adding a looped replication peer (clusterId == peerClusterId), the > following code terminates the replication source thread, but since the source > manager still holds a reference, WALs continue to get enqueued, and never get > cleaned because they're stuck in the queue, leading to an unsustainable > buildup. Furthermore, the replication statistics thread will continue to > print statistics for the terminated source. > {code} > if (clusterId.equals(peerClusterId) && > !replicationEndpoint.canReplicateToSameCluster()) { > this.terminate("ClusterId " + clusterId + " is replicating to itself: > peerClusterId " > + peerClusterId + " which is not allowed by ReplicationEndpoint:" > + replicationEndpoint.getClass().getName(), null, false); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16008) A robust way deal with early termination of HBCK
[ https://issues.apache.org/jira/browse/HBASE-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767607#comment-15767607 ] Hudson commented on HBASE-16008: FAILURE: Integrated in Jenkins build HBase-0.98-on-Hadoop-1.1 #1299 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1299/]) HBASE-16008 A robust way deal with early termination of HBCK (Stephen (apurtell: rev f63b5a0db9e630af69654fca59cf7ab3f724245f) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * (edit) hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/SnapshotProtos.java * (edit) hbase-protocol/src/main/protobuf/Master.proto * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * (add) hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterMaintenanceModeTracker.java * (edit) hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java > A robust way deal with early termination of HBCK > > > Key: HBASE-16008 > URL: https://issues.apache.org/jira/browse/HBASE-16008 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0, 1.4.0, 0.98.24 > > Attachments: HBASE-16008-0.98.patch, HBASE-16008.v0-master.patch, > HBASE-16008.v1-branch-1.patch, HBASE-16008.v1-master.patch > > > When HBCK is running, we want to disable Catalog Janitor, Balancer and > Split/Merge. Today, the implementation is not robust. If HBCK is terminated > earlier by Control-C, the changed state would not be reset to original. > HBASE-15406 was trying to solve this problem for Split/Merge switch. The > implementation is complicated, and it did not solve CJ and Balancer. > The proposal to solve the problem is to use a znode to indicate that the HBCK > is running. CJ, balancer, and Split/Merge switch all look for this znode > before doing it operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16010) Put draining function through Admin API
[ https://issues.apache.org/jira/browse/HBASE-16010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767600#comment-15767600 ] Matt Warhaftig commented on HBASE-16010: Thanks Jerry, Feel free to fix the first issue and commit. As for the second issue, MasterRpcServices uses the fully qualified classname of {{org.apache.hadoop.hbase.shaded.protobuf.generated.HBaseProtos.ServerName}} because MasterRpcServices already has a {{ServerName}} import entry ({{org.apache.hadoop.hbase.ServerName}}) and they would conflict. > Put draining function through Admin API > --- > > Key: HBASE-16010 > URL: https://issues.apache.org/jira/browse/HBASE-16010 > Project: HBase > Issue Type: Improvement >Reporter: Jerry He >Assignee: Matt Warhaftig >Priority: Minor > Attachments: hbase-16010-v1.patch, hbase-16010-v2.patch > > > Currently, there is no Amdin API for draining function. Client has to > interact directly with Zookeeper draining node to add and remove draining > servers. > For example, in draining_servers.rb: > {code} > zkw = org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.new(config, > "draining_servers", nil) > parentZnode = zkw.drainingZNode > begin > for server in servers > node = ZKUtil.joinZNode(parentZnode, server) > ZKUtil.createAndFailSilent(zkw, node) > end > ensure > zkw.close() > end > {code} > This is not good in cases like secure clusters with protected Zookeeper nodes. > Let's put draining function through Admin API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers
[ https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767562#comment-15767562 ] Hudson commented on HBASE-17328: FAILURE: Integrated in Jenkins build HBase-0.98-matrix #428 (See [https://builds.apache.org/job/HBase-0.98-matrix/428/]) HBASE-17328 Properly dispose of looped replication peers (apurtell: rev 5ea953b0115ac814f67e4fb076b2fdce85dd22cf) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java > Properly dispose of looped replication peers > > > Key: HBASE-17328 > URL: https://issues.apache.org/jira/browse/HBASE-17328 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.4.0, 0.98.23 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9 > > Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, > HBASE-17328-master.v2.patch, HBASE-17328.0.98.v4.patch, > HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch, > HBASE-17328.branch-1.1.v4.patch, HBASE-17328.master.v4.patch > > > When adding a looped replication peer (clusterId == peerClusterId), the > following code terminates the replication source thread, but since the source > manager still holds a reference, WALs continue to get enqueued, and never get > cleaned because they're stuck in the queue, leading to an unsustainable > buildup. Furthermore, the replication statistics thread will continue to > print statistics for the terminated source. > {code} > if (clusterId.equals(peerClusterId) && > !replicationEndpoint.canReplicateToSameCluster()) { > this.terminate("ClusterId " + clusterId + " is replicating to itself: > peerClusterId " > + peerClusterId + " which is not allowed by ReplicationEndpoint:" > + replicationEndpoint.getClass().getName(), null, false); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-16008) A robust way deal with early termination of HBCK
[ https://issues.apache.org/jira/browse/HBASE-16008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767563#comment-15767563 ] Hudson commented on HBASE-16008: FAILURE: Integrated in Jenkins build HBase-0.98-matrix #428 (See [https://builds.apache.org/job/HBase-0.98-matrix/428/]) HBASE-16008 A robust way deal with early termination of HBCK (Stephen (apurtell: rev f63b5a0db9e630af69654fca59cf7ab3f724245f) * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/CatalogJanitor.java * (edit) hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/MasterProtos.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java * (add) hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/MasterMaintenanceModeTracker.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * (edit) hbase-protocol/src/main/protobuf/Master.proto * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/zookeeper/ZooKeeperWatcher.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * (edit) hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/SnapshotProtos.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java > A robust way deal with early termination of HBCK > > > Key: HBASE-16008 > URL: https://issues.apache.org/jira/browse/HBASE-16008 > Project: HBase > Issue Type: Improvement > Components: hbck >Reporter: Stephen Yuan Jiang >Assignee: Stephen Yuan Jiang > Fix For: 2.0.0, 1.4.0, 0.98.24 > > Attachments: HBASE-16008-0.98.patch, HBASE-16008.v0-master.patch, > HBASE-16008.v1-branch-1.patch, HBASE-16008.v1-master.patch > > > When HBCK is running, we want to disable Catalog Janitor, Balancer and > Split/Merge. Today, the implementation is not robust. If HBCK is terminated > earlier by Control-C, the changed state would not be reset to original. > HBASE-15406 was trying to solve this problem for Split/Merge switch. The > implementation is complicated, and it did not solve CJ and Balancer. > The proposal to solve the problem is to use a znode to indicate that the HBCK > is running. CJ, balancer, and Split/Merge switch all look for this znode > before doing it operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17069) RegionServer writes invalid META entries for split daughters in some circumstances
[ https://issues.apache.org/jira/browse/HBASE-17069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767531#comment-15767531 ] Andrew Purtell commented on HBASE-17069: FWIW I tested the head of branch-1.3 in my rig and it failed the same way, "no serialized HRegionInfo" in some rows in meta, with resulting job failure as part of the keyspace went missing. > RegionServer writes invalid META entries for split daughters in some > circumstances > -- > > Key: HBASE-17069 > URL: https://issues.apache.org/jira/browse/HBASE-17069 > Project: HBase > Issue Type: Bug >Affects Versions: 1.2.4 >Reporter: Andrew Purtell >Priority: Critical > Attachments: daughter_1_d55ef81c2f8299abbddfce0445067830.log, > daughter_2_08629d59564726da2497f70451aafcdb.log, logs.tar.gz, > parent-393d2bfd8b1c52ce08540306659624f2.log > > > I have been seeing frequent ITBLL failures testing various versions of 1.2.x. > Over the lifetime of 1.2.x the following issues have been fixed: > - HBASE-15315 (Remove always set super user call as high priority) > - HBASE-16093 (Fix splits failed before creating daughter regions leave meta > inconsistent) > And this one is pending: > - HBASE-17044 (Fix merge failed before creating merged region leaves meta > inconsistent) > I can apply all of the above to branch-1.2 and still see this failure: > *The life of stillborn region d55ef81c2f8299abbddfce0445067830* > *Master sees SPLITTING_NEW* > {noformat} > 2016-11-08 04:23:21,186 INFO [AM.ZK.Worker-pool2-t82] master.RegionStates: > Transition null to {d55ef81c2f8299abbddfce0445067830 state=SPLITTING_NEW, > ts=1478579001186, server=node-3.cluster,16020,1478578389506} > {noformat} > *The RegionServer creates it* > {noformat} > 2016-11-08 04:23:26,035 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for GomnU: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,038 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for big: blockCache=LruBlockCache{blockCount=34, > currentSize=14996112, freeSize=12823716208, maxSize=12838712320, > heapSize=14996112, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,442 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for meta: blockCache=LruBlockCache{blockCount=63, > currentSize=17187656, freeSize=12821524664, maxSize=12838712320, > heapSize=17187656, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,713 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for nwmrW: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,715 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for piwbr: blockCache=LruBlockCache{blockCount=96, > currentSize=19178440, freeSize=12819533880, maxSize=12838712320, > heapSize=19178440, minSize=12196776960, minFactor=0.95, multiSize=6098388480, > multiFactor=0.5, singleSize=3049194240, singleFactor=0.25}, > cacheDataOnRead=true, cacheDataOnWrite=false, cacheIndexesOnWrite=false, > cacheBloomsOnWrite=false, cacheEvictOnClose=false, cacheDataCompressed=false, > prefetchOnOpen=false > 2016-11-08 04:23:26,717 INFO > [StoreOpener-d55ef81c2f8299abbddfce0445067830-1] hfile.CacheConfig: Created > cacheConfig for tiny: blockCache=LruBlockCache{blockCount=96, > c
[jira] [Commented] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767514#comment-15767514 ] stack commented on HBASE-17341: --- If we timeout, is it a WARN or an ERROR? Do we lose data if we timeout just keep processing? Thanks. Good find. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17352) Fix hbase-assembly build with bash 4
[ https://issues.apache.org/jira/browse/HBASE-17352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-17352: --- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.0.0 Status: Resolved (was: Patch Available) Thanks for the patch, Junegunn > Fix hbase-assembly build with bash 4 > > > Key: HBASE-17352 > URL: https://issues.apache.org/jira/browse/HBASE-17352 > Project: HBase > Issue Type: Bug >Reporter: Junegunn Choi >Assignee: Junegunn Choi >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-17352.patch > > > hbase-assembly fails to build with bash 4. > {noformat} > [DEBUG] Executing command line: [env, bash, -c, cat > maven-shared-archive-resources/META-INF/NOTICE \ > `find > /Users/jg/github/hbase/hbase-assembly/target/dependency -iname NOTICE -or > -iname NOTICE.txt` \] > [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec > (concat-NOTICE-files) on project hbase-assembly: Command execution failed. > Process exited with an error: 1 (Exit value: 1) -> [Help 1] > org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute > goal org.codehaus.mojo:exec-maven-plugin:1.4.0:exec (concat-NOTICE-files) on > project hbase-assembly: Command execution failed. > {noformat} > The error is caused by the trailing backslash in the bash command for > {{concat-NOTICE-files}}. You can see the behavioral difference between bash 3 > and 4 with the following snippet. > {code} > $ # Using bash 3 > $ /bin/bash -c 'cat <(echo foo) \' && echo good || echo bad > foo > good > $ # Using bash 4 > $ /usr/local/bin/bash -c 'cat <(echo foo) \' && echo good || echo bad > foo > cat: \: No such file or directory > bad > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767488#comment-15767488 ] stack commented on HBASE-17314: --- [~yangzhe1991] If making an addendum for hanging test, here is some other input. HConstants is for defines that are used in many places. The preference is to keep defines beside the code where they are used so move these to ReplicationSource 935 public static final String REPLICATION_SOURCE_TOTAL_BUFFER_KEY = "replication.total.buffer.quota"; 936 public static final int REPLICATION_SOURCE_TOTAL_BUFFER_DFAULT = 256 * 1024 * 1024; Suggest too that the explanation for why 256M that you give above be written as a comment on the above define. This looks like it could be package private rather than public: 2344 public ReplicationSourceService getReplicationSourceService() { Otherwise patch looks good. Pity that replication is so hard to test. Any ideas on how to make it easier? Test could be hanging for any of many reasons given you have to put up two clusters inside one jvm. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17341) Add a timeout during replication endpoint termination
[ https://issues.apache.org/jira/browse/HBASE-17341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-17341: --- Hadoop Flags: Reviewed Fix Version/s: 1.4.0 2.0.0 Integrated to branch-1 and master. Waiting for branch-1.3 to open. > Add a timeout during replication endpoint termination > - > > Key: HBASE-17341 > URL: https://issues.apache.org/jira/browse/HBASE-17341 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0, 1.3.0, 1.4.0, 1.1.7, 0.98.23, 1.2.4 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17341.branch-1.1.v1.patch, > HBASE-17341.branch-1.1.v2.patch, HBASE-17341.master.v1.patch, > HBASE-17341.master.v2.patch > > > In ReplicationSource#terminate(), a Future is obtained from > ReplicationEndpoint#stop(). Future.get() is then called, but can potentially > hang there if something went wrong in the endpoint stop(). > Hanging there has serious implications, because the thread could potentially > be the ZK event thread (e.g. watcher calls > ReplicationSourceManager#removePeer() -> ReplicationSource#terminate() -> > blocked). This means no other events in the ZK event queue will get > processed, which for HBase means other ZK watches such as replication watch > notifications, snapshot watch notifications, even RegionServer shutdown will > all get blocked. > The short term fix addressed here is to simply add a timeout for > Future.get(). But the severe consequences seen here perhaps suggest a > broader refactoring of the ZKWatcher usage in HBase is in order, to protect > against situations like this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17314) Limit total buffered size for all replication sources
[ https://issues.apache.org/jira/browse/HBASE-17314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767290#comment-15767290 ] Ted Yu commented on HBASE-17314: TestGlobalThrottler hangs in master build. Please investigate. > Limit total buffered size for all replication sources > - > > Key: HBASE-17314 > URL: https://issues.apache.org/jira/browse/HBASE-17314 > Project: HBase > Issue Type: Improvement > Components: Replication >Reporter: Phil Yang >Assignee: Phil Yang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17314.branch-1.v01.patch, HBASE-17314.v01.patch, > HBASE-17314.v02.patch, HBASE-17314.v03.patch, HBASE-17314.v04.patch > > > If we have many peers or some servers have many recovered queues, we will > hold many entries in memory which will increase the pressure of GC, even > maybe OOM because we will read entries for 64MB to buffer in default for one > source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767185#comment-15767185 ] Hadoop QA commented on HBASE-17345: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 23s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 0s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 51s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 51s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 46s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 3s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 46s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 24s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 26m 42s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s {color} | {color:green} hbase-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 55s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 26s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 141m 41s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.regionserver.TestGlobalThrottler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.11.2 Server=1.11.2 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844236/HBASE-17345.patch | | JIRA Issue | HBASE-17345 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux 1a23f33321b5 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / e1f4aae | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/5014/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/5014/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Results | htt
[jira] [Commented] (HBASE-17334) Add locate row before/after support for AsyncRegionLocator
[ https://issues.apache.org/jira/browse/HBASE-17334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767119#comment-15767119 ] Yu Li commented on HBASE-17334: --- +1 on v1 patch, only a trivial question on RB, thanks. > Add locate row before/after support for AsyncRegionLocator > -- > > Key: HBASE-17334 > URL: https://issues.apache.org/jira/browse/HBASE-17334 > Project: HBase > Issue Type: Sub-task > Components: Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17334-v1.patch, HBASE-17334.patch > > > Now we only have a getPreviousRegionLocation method which is only used for > reverse scan, and it is not perfect as it can not deal with region merge. As > we want to add inclusive/exclusive support for start row and end row of a > scan, we need to implement general locate to row before/after method for > AsyncRegionLocator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767026#comment-15767026 ] Duo Zhang commented on HBASE-17345: --- Seems the reviewboard is broken... I can not upload the patch here, it says that I need to use --full-index. But if I upload the patch generated with --full-index, it tells me 'The specified diff file could not be parsed.'. No idea... > Implement batch > --- > > Key: HBASE-17345 > URL: https://issues.apache.org/jira/browse/HBASE-17345 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17345.patch > > > Add the support for general batch based on the code introduced in HBASE-17142. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17262) Refactor RpcServer so as to make it extendable and/or pluggable
[ https://issues.apache.org/jira/browse/HBASE-17262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766913#comment-15766913 ] Hadoop QA commented on HBASE-17262: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 6 new or modified test files. {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 18s {color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 29s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 27s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 40s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 11s {color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 55s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 50s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 27s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 28s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 32s {color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 or 3.0.0-alpha1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 51s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 91m 20s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 20s {color} | {color:green} hbase-it in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 27s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 132m 42s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.replication.regionserver.TestGlobalThrottler | \\ \\ || Subsystem || Report/Notes || | Docker | Client=1.12.3 Server=1.12.3 Image:yetus/hbase:8d52d23 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12844217/HBASE-17262.master.V4.patch | | JIRA Issue | HBASE-17262 | | Optional Tests | asflicense javac javadoc unit findbugs hadoopcheck hbaseanti checkstyle compile | | uname | Linux b62bf154bd94 3.13.0-105-generic #152-Ubuntu SMP Fri Dec 2 15:37:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / e1f4aae | | Default Java | 1.8.0_111 | | findbugs | v3.0.0 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/5013/artifact/patchprocess/patch-unit-hbase-server.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HBASE-Build/5013/artifact/patchprocess/patch-unit-hbase-server.txt | | Test Resu
[jira] [Created] (HBASE-17356) Add replica read support
Duo Zhang created HBASE-17356: - Summary: Add replica read support Key: HBASE-17356 URL: https://issues.apache.org/jira/browse/HBASE-17356 Project: HBase Issue Type: Sub-task Reporter: Duo Zhang I think we can do better for scan at least as now we will pass the mvcc to client. We can use the mvcc to determine if we can get a consistent view when reading from replicas other than primary. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-17345: -- Attachment: HBASE-17345.patch A first version. Implement a general batch method and all other multi get, multi put and multi delete depend on it. Will add more comments and tests in the next patch. > Implement batch > --- > > Key: HBASE-17345 > URL: https://issues.apache.org/jira/browse/HBASE-17345 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17345.patch > > > Add the support for general batch based on the code introduced in HBASE-17142. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-17345) Implement batch
[ https://issues.apache.org/jira/browse/HBASE-17345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang updated HBASE-17345: -- Assignee: Duo Zhang Status: Patch Available (was: Open) > Implement batch > --- > > Key: HBASE-17345 > URL: https://issues.apache.org/jira/browse/HBASE-17345 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client >Affects Versions: 2.0.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0 > > Attachments: HBASE-17345.patch > > > Add the support for general batch based on the code introduced in HBASE-17142. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-17328) Properly dispose of looped replication peers
[ https://issues.apache.org/jira/browse/HBASE-17328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766854#comment-15766854 ] Hudson commented on HBASE-17328: SUCCESS: Integrated in Jenkins build HBase-1.3-JDK7 #73 (See [https://builds.apache.org/job/HBase-1.3-JDK7/73/]) HBASE-17328 Properly dispose of looped replication peers (apurtell: rev 7b3187c1a02eb875b2ba2daa49d43738f4dce8f8) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestMasterReplication.java > Properly dispose of looped replication peers > > > Key: HBASE-17328 > URL: https://issues.apache.org/jira/browse/HBASE-17328 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 2.0.0, 1.4.0, 0.98.23 >Reporter: Vincent Poon >Assignee: Vincent Poon >Priority: Critical > Fix For: 2.0.0, 1.3.0, 1.4.0, 1.2.5, 0.98.24, 1.1.9 > > Attachments: HBASE-17328-1.1.v1.patch, HBASE-17328-master.v1.patch, > HBASE-17328-master.v2.patch, HBASE-17328.0.98.v4.patch, > HBASE-17328.branch-1.1.v2.patch, HBASE-17328.branch-1.1.v3.patch, > HBASE-17328.branch-1.1.v4.patch, HBASE-17328.master.v4.patch > > > When adding a looped replication peer (clusterId == peerClusterId), the > following code terminates the replication source thread, but since the source > manager still holds a reference, WALs continue to get enqueued, and never get > cleaned because they're stuck in the queue, leading to an unsustainable > buildup. Furthermore, the replication statistics thread will continue to > print statistics for the terminated source. > {code} > if (clusterId.equals(peerClusterId) && > !replicationEndpoint.canReplicateToSameCluster()) { > this.terminate("ClusterId " + clusterId + " is replicating to itself: > peerClusterId " > + peerClusterId + " which is not allowed by ReplicationEndpoint:" > + replicationEndpoint.getClass().getName(), null, false); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11392) add/remove peer requests should be routed through master
[ https://issues.apache.org/jira/browse/HBASE-11392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766831#comment-15766831 ] Hudson commented on HBASE-11392: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #2171 (See [https://builds.apache.org/job/HBase-Trunk_matrix/2171/]) HBASE-11392 add/remove peer requests should be routed through master (zghao: rev e1f4aaeacdcbaffb02a08c29493601547c381941) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/MockNoopMasterServices.java * (add) hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/ReplicationProtos.java * (add) hbase-protocol-shaded/src/main/protobuf/Replication.proto * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterCoprocessorHost.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/RequestConverter.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/MasterObserver.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/replication/ReplicationAdmin.java * (edit) hbase-protocol-shaded/src/main/java/org/apache/hadoop/hbase/shaded/protobuf/generated/MasterProtos.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java * (edit) src/main/asciidoc/_chapters/appendix_acl_matrix.adoc * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/replication/ReplicationFactory.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestSerialReplication.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelReplicationWithExpAsString.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationBase.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/security/visibility/TestVisibilityLabelsReplication.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/security/access/TestAccessController.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/Admin.java * (edit) hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/client/replication/TestReplicationAdmin.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationWithTags.java * (edit) hbase-protocol-shaded/src/main/protobuf/Master.proto * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterRpcServices.java * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterServices.java * (add) hbase-server/src/main/java/org/apache/hadoop/hbase/master/replication/ReplicationManager.java > add/remove peer requests should be routed through master > > > Key: HBASE-11392 > URL: https://issues.apache.org/jira/browse/HBASE-11392 > Project: HBase > Issue Type: Sub-task >Reporter: Enis Soztutar >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-11392-v1.patch, HBASE-11392-v2.patch, > HBASE-11392-v3.patch, HBASE-11392-v4.patch, HBASE-11392-v5.patch, > HBASE-11392-v6.patch > > > ReplicationAdmin directly operates over the zookeeper data for replication > setup. We should move these operations to be routed through master for two > reasons: > - Replication implementation details are exposed to client. We should move > most of the replication related classes to hbase-server package. > - Routing the requests through master is the standard practice for all other > operations. It allows for decoupling implementation details from the client > and code. > Review board: https://reviews.apache.org/r/54730/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)