[jira] [Updated] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace by only writing namespace
[ https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinxin fan updated HBASE-19336: --- Summary: Improve rsgroup to allow assign all tables within a specified namespace by only writing namespace (was: Improve rsgroup to allow assign all tables within a specified namespace from one group to another ) > Improve rsgroup to allow assign all tables within a specified namespace by > only writing namespace > - > > Key: HBASE-19336 > URL: https://issues.apache.org/jira/browse/HBASE-19336 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Affects Versions: 2.0.0-alpha-4 >Reporter: xinxin fan >Assignee: xinxin fan > Attachments: HBASE-19336-master.patch > > > Currently, use can only assign tables within a namespace from one group to > another by writing all table names in move_tables_rsgroup command. Allowing > to assign all tables within a specifed namespace by only wirting namespace > name is useful. > Usage as follows: > {code:java} > hbase(main):055:0> move_tables_rsgroup 'default',['@ns1'] > Took 2.2211 seconds > {code} > {code:java} > hbase(main):051:0* move_servers_tables_rsgroup > 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3'] > Took 15.3710 seconds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace by only writing namespace
[ https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263913#comment-16263913 ] xinxin fan commented on HBASE-19336: Here is the patch > Improve rsgroup to allow assign all tables within a specified namespace by > only writing namespace > - > > Key: HBASE-19336 > URL: https://issues.apache.org/jira/browse/HBASE-19336 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Affects Versions: 2.0.0-alpha-4 >Reporter: xinxin fan >Assignee: xinxin fan > Attachments: HBASE-19336-master.patch > > > Currently, use can only assign tables within a namespace from one group to > another by writing all table names in move_tables_rsgroup command. Allowing > to assign all tables within a specifed namespace by only wirting namespace > name is useful. > Usage as follows: > {code:java} > hbase(main):055:0> move_tables_rsgroup 'default',['@ns1'] > Took 2.2211 seconds > {code} > {code:java} > hbase(main):051:0* move_servers_tables_rsgroup > 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3'] > Took 15.3710 seconds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another
[ https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-19336 started by xinxin fan. -- > Improve rsgroup to allow assign all tables within a specified namespace from > one group to another > -- > > Key: HBASE-19336 > URL: https://issues.apache.org/jira/browse/HBASE-19336 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Affects Versions: 2.0.0-alpha-4 >Reporter: xinxin fan >Assignee: xinxin fan > Attachments: HBASE-19336-master.patch > > > Currently, use can only assign tables within a namespace from one group to > another by writing all table names in move_tables_rsgroup command. Allowing > to assign all tables within a specifed namespace by only wirting namespace > name is useful. > Usage as follows: > {code:java} > hbase(main):055:0> move_tables_rsgroup 'default',['@ns1'] > Took 2.2211 seconds > {code} > {code:java} > hbase(main):051:0* move_servers_tables_rsgroup > 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3'] > Took 15.3710 seconds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another
[ https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinxin fan updated HBASE-19336: --- Attachment: HBASE-19336-master.patch > Improve rsgroup to allow assign all tables within a specified namespace from > one group to another > -- > > Key: HBASE-19336 > URL: https://issues.apache.org/jira/browse/HBASE-19336 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Affects Versions: 2.0.0-alpha-4 >Reporter: xinxin fan >Assignee: xinxin fan > Attachments: HBASE-19336-master.patch > > > Currently, use can only assign tables within a namespace from one group to > another by writing all table names in move_tables_rsgroup command. Allowing > to assign all tables within a specifed namespace by only wirting namespace > name is useful. > Usage as follows: > {code:java} > hbase(main):055:0> move_tables_rsgroup 'default',['@ns1'] > Took 2.2211 seconds > {code} > {code:java} > hbase(main):051:0* move_servers_tables_rsgroup > 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3'] > Took 15.3710 seconds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another
xinxin fan created HBASE-19336: -- Summary: Improve rsgroup to allow assign all tables within a specified namespace from one group to another Key: HBASE-19336 URL: https://issues.apache.org/jira/browse/HBASE-19336 Project: HBase Issue Type: Improvement Components: rsgroup Affects Versions: 2.0.0-alpha-4 Reporter: xinxin fan Currently, use can only assign tables within a namespace from one group to another by writing all table names in move_tables_rsgroup command. Allowing to assign all tables within a specifed namespace by only wirting namespace name is useful. Usage as follows: {code:java} hbase(main):055:0> move_tables_rsgroup 'default',['@ns1'] Took 2.2211 seconds {code} {code:java} hbase(main):051:0* move_servers_tables_rsgroup 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3'] Took 15.3710 seconds {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HBASE-19336) Improve rsgroup to allow assign all tables within a specified namespace from one group to another
[ https://issues.apache.org/jira/browse/HBASE-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xinxin fan reassigned HBASE-19336: -- Assignee: xinxin fan > Improve rsgroup to allow assign all tables within a specified namespace from > one group to another > -- > > Key: HBASE-19336 > URL: https://issues.apache.org/jira/browse/HBASE-19336 > Project: HBase > Issue Type: Improvement > Components: rsgroup >Affects Versions: 2.0.0-alpha-4 >Reporter: xinxin fan >Assignee: xinxin fan > > Currently, use can only assign tables within a namespace from one group to > another by writing all table names in move_tables_rsgroup command. Allowing > to assign all tables within a specifed namespace by only wirting namespace > name is useful. > Usage as follows: > {code:java} > hbase(main):055:0> move_tables_rsgroup 'default',['@ns1'] > Took 2.2211 seconds > {code} > {code:java} > hbase(main):051:0* move_servers_tables_rsgroup > 'rsgroup1',['hbase39.lt.163.org:60020'],['@ns1','@ns2','table3'] > Took 15.3710 seconds > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer
[ https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16868: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks all for reviewing. Pushed to master and branch-2. > Add a replicate_all flag to avoid misuse the namespaces and table-cfs config > of replication peer > > > Key: HBASE-16868 > URL: https://issues.apache.org/jira/browse/HBASE-16868 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0, 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-16868.master.001.patch, > HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, > HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, > HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, > HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, > HBASE-16868.master.010.patch, HBASE-16868.master.011.patch > > > First add a new peer by shell cmd. > {code} > add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase". > {code} > If we don't set namespaces and table cfs in peer config. It means replicate > all tables to the peer cluster. > Then append a table to the peer config. > {code} > append_peer_tableCFs '1', {"table1" => []} > {code} > Then this peer will only replicate table1 to the peer cluster. It changes to > replicate only one table from replicate all tables in the cluster. It is very > easy to misuse in production cluster. So we should avoid appending table to a > peer which replicates all table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-15320) HBase connector for Kafka Connect
[ https://issues.apache.org/jira/browse/HBASE-15320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263901#comment-16263901 ] Ted Yu commented on HBASE-15320: Thanks for the hard work, Mike. Please wait for more reviews from other committers. > HBase connector for Kafka Connect > - > > Key: HBASE-15320 > URL: https://issues.apache.org/jira/browse/HBASE-15320 > Project: HBase > Issue Type: New Feature > Components: Replication >Reporter: Andrew Purtell >Assignee: Mike Wingert > Labels: beginner > Fix For: 3.0.0 > > Attachments: HBASE-15320.master.1.patch, HBASE-15320.master.2.patch, > HBASE-15320.master.3.patch, HBASE-15320.master.4.patch, > HBASE-15320.master.5.patch, HBASE-15320.master.6.patch, > HBASE-15320.master.7.patch, HBASE-15320.master.8.patch, > HBASE-15320.master.8.patch, HBASE-15320.master.9.patch, HBASE-15320.pdf, > HBASE-15320.pdf > > > Implement an HBase connector with source and sink tasks for the Connect > framework (http://docs.confluent.io/2.0.0/connect/index.html) available in > Kafka 0.9 and later. > See also: > http://www.confluent.io/blog/announcing-kafka-connect-building-large-scale-low-latency-data-pipelines > An HBase source > (http://docs.confluent.io/2.0.0/connect/devguide.html#task-example-source-task) > could be implemented as a replication endpoint or WALObserver, publishing > cluster wide change streams from the WAL to one or more topics, with > configurable mapping and partitioning of table changes to topics. > An HBase sink task > (http://docs.confluent.io/2.0.0/connect/devguide.html#sink-tasks) would > persist, with optional transformation (JSON? Avro?, map fields to native > schema?), Kafka SinkRecords into HBase tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19266) TestAcidGuarantees should cover adaptive in-memory compaction
[ https://issues.apache.org/jira/browse/HBASE-19266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263888#comment-16263888 ] Ted Yu commented on HBASE-19266: w.r.t. 'memstoreSize to a negative value' error, the first occurrence is in TestAcidGuarantees#testMixedAtomicity However, if I run the subtest alone, it passes with EAGER policy. > TestAcidGuarantees should cover adaptive in-memory compaction > - > > Key: HBASE-19266 > URL: https://issues.apache.org/jira/browse/HBASE-19266 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Chia-Ping Tsai >Priority: Minor > Attachments: HBASE-19266.v0.patch > > > Currently TestAcidGuarantees populates 3 policies of (in-memory) compaction. > Adaptive in-memory compaction is new and should be added as 4th compaction > policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19325) Pass a list of server name to postClearDeadServers
[ https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangxu Cheng updated HBASE-19325: -- Attachment: HBASE-19325.branch-1.001.patch The failed ut not related.Retry again > Pass a list of server name to postClearDeadServers > -- > > Key: HBASE-19325 > URL: https://issues.apache.org/jira/browse/HBASE-19325 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-2 >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng > Attachments: HBASE-19325.branch-1.001.patch, > HBASE-19325.branch-1.001.patch, HBASE-19325.branch-2.001.patch > > > Over on the tail of HBASE-18131. [~chia7712] said > {quote} > (Revisiting the AccessController remind me of this issue) > Could we remove the duplicate code on the server side? Why not pass a list of > server name to postClearDeadServers and postListDeadServers? > {quote} > The duplicate code has been removed in HBASE-19131.Now Pass a list of server > name to postClearDeadServers -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer
[ https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263872#comment-16263872 ] Guanghao Zhang commented on HBASE-16868: Ok. All ut passed. Will fix the checkstyle when commit. > Add a replicate_all flag to avoid misuse the namespaces and table-cfs config > of replication peer > > > Key: HBASE-16868 > URL: https://issues.apache.org/jira/browse/HBASE-16868 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0, 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-16868.master.001.patch, > HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, > HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, > HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, > HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, > HBASE-16868.master.010.patch, HBASE-16868.master.011.patch > > > First add a new peer by shell cmd. > {code} > add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase". > {code} > If we don't set namespaces and table cfs in peer config. It means replicate > all tables to the peer cluster. > Then append a table to the peer config. > {code} > append_peer_tableCFs '1', {"table1" => []} > {code} > Then this peer will only replicate table1 to the peer cluster. It changes to > replicate only one table from replicate all tables in the cluster. It is very > easy to misuse in production cluster. So we should avoid appending table to a > peer which replicates all table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263871#comment-16263871 ] ramkrishna.s.vasudevan commented on HBASE-16890: I tried this out (still on a single node cluster) {code} nohup ./hbase org.apache.hadoop.hbase.PerformanceEvaluation --nomapred --presplit=50 --size=50 --columns=50 --valueSize=200 --writeToWAL=true --bloomFilter=NONE randomWrite 50 {code} AsyncWAL is faster in terms of throughput (completion time). AsyncWAL Avg: 1103134ms FSHLog Avg: 1280875ms Though we have more cols the perf seems to be better for AsyncWAL. For now I have only one node . I can try with multiple nodes next week or so. But previously this one node test was showing FSHLog was faster and now that is not the case is what I get. I have not digged in to the logs like I did previously as currently doing something else. But I will do those next week or so. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19335) Fix waitUntilAllRegionsAssigned
[ https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263844#comment-16263844 ] Appy commented on HBASE-19335: -- Testing {{for i in `seq 1 10`; do mvn test -Dtest=TestRegionObserverInterface#testRecovery -pl hbase-server; done}} Without the patch, test failed 8 times and passed 2 times (with timeout of 60s). With the patch, test passed 10 times. Test runtime was ~20s (per run). > Fix waitUntilAllRegionsAssigned > --- > > Key: HBASE-19335 > URL: https://issues.apache.org/jira/browse/HBASE-19335 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-19335.master.001.patch > > > Found when debugging flaky test TestRegionObserverInterface#testRecovery. > In the end, the test does the following: > - Kills the RS > - Waits for all regions to be assigned > - Some validation (unrelated) > - Cleanup: delete table. > {noformat} > cluster.killRegionServer(rs1.getRegionServer().getServerName()); > Threads.sleep(1000); // Let the kill soak in. > util.waitUntilAllRegionsAssigned(tableName); > LOG.info("All regions assigned"); > verifyMethodResult(SimpleRegionObserver.class, > new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", > "getCtPreWALRestore", > "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" }, > tableName, new Integer[] { 1, 1, 2, 2, 0, 0 }); > } finally { > util.deleteTable(tableName); > table.close(); > } > } > {noformat} > However, looking at test logs, found that we had overlapping Assigns with > Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout. > Assigns were from the ServerCrashRecovery and Unassigns were from the > deleteTable cleanup. > Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) > not wait until recovery was complete. > Answer: Looks like that function is only meant for sunny scenarios but not > for crashes. It iterates over meta and just [checks for *some value* in the > server > column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421] > which is obviously present and equal to the server that was just killed. > This bug must be affecting other fault tolerance tests too and fixing it may > fix more than just one test, hopefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263840#comment-16263840 ] Hadoop QA commented on HBASE-19290: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 6s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 12s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 53s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 44s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 3s{color} | {color:red} hbase-server: The patch generated 1 new + 5 unchanged - 0 fixed = 6 total (was 5) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 3s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 54m 7s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 31s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 94m 24s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 19s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}168m 41s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19290 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898980/HBASE-19290.master.004.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux df6057c46798 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / cdc2bb17ff | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/9984/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/9984/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output |
[jira] [Commented] (HBASE-18946) Stochastic load balancer assigns replica regions to the same RS
[ https://issues.apache.org/jira/browse/HBASE-18946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263832#comment-16263832 ] ramkrishna.s.vasudevan commented on HBASE-18946: bq.Looking in the patch, are we doing assign placement inside in CreateTableProcedure still? No we are not doing assign placement in create table proc. We are just seperating out the regions so that primary are assigned and then replicas. There is no state maintained in LB or in the Assign proces like in first patch. bq. Could we pass the AM the new table regions and ask it to return us plans to use assigning? I think that is what we are doing now in this patch right - we pass the regions and get the right server for them ensuring replicas don't sit together. > Stochastic load balancer assigns replica regions to the same RS > --- > > Key: HBASE-18946 > URL: https://issues.apache.org/jira/browse/HBASE-18946 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-18946.patch, HBASE-18946.patch, > HBASE-18946_2.patch, HBASE-18946_2.patch, > TestRegionReplicasWithRestartScenarios.java > > > Trying out region replica and its assignment I can see that some times the > default LB Stocahstic load balancer assigns replica regions to the same RS. > This happens when we have 3 RS checked in and we have a table with 3 > replicas. When a RS goes down then the replicas being assigned to same RS is > acceptable but the case when we have enough RS to assign this behaviour is > undesirable and does not solve the purpose of replicas. > [~huaxiang] and [~enis]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19335) Fix waitUntilAllRegionsAssigned
[ https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-19335: - Status: Patch Available (was: Open) > Fix waitUntilAllRegionsAssigned > --- > > Key: HBASE-19335 > URL: https://issues.apache.org/jira/browse/HBASE-19335 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-19335.master.001.patch > > > Found when debugging flaky test TestRegionObserverInterface#testRecovery. > In the end, the test does the following: > - Kills the RS > - Waits for all regions to be assigned > - Some validation (unrelated) > - Cleanup: delete table. > {noformat} > cluster.killRegionServer(rs1.getRegionServer().getServerName()); > Threads.sleep(1000); // Let the kill soak in. > util.waitUntilAllRegionsAssigned(tableName); > LOG.info("All regions assigned"); > verifyMethodResult(SimpleRegionObserver.class, > new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", > "getCtPreWALRestore", > "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" }, > tableName, new Integer[] { 1, 1, 2, 2, 0, 0 }); > } finally { > util.deleteTable(tableName); > table.close(); > } > } > {noformat} > However, looking at test logs, found that we had overlapping Assigns with > Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout. > Assigns were from the ServerCrashRecovery and Unassigns were from the > deleteTable cleanup. > Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) > not wait until recovery was complete. > Answer: Looks like that function is only meant for sunny scenarios but not > for crashes. It iterates over meta and just [checks for *some value* in the > server > column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421] > which is obviously present and equal to the server that was just killed. > This bug must be affecting other fault tolerance tests too and fixing it may > fix more than just one test, hopefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19335) Fix waitUntilAllRegionsAssigned
[ https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-19335: - Attachment: HBASE-19335.master.001.patch > Fix waitUntilAllRegionsAssigned > --- > > Key: HBASE-19335 > URL: https://issues.apache.org/jira/browse/HBASE-19335 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-19335.master.001.patch > > > Found when debugging flaky test TestRegionObserverInterface#testRecovery. > In the end, the test does the following: > - Kills the RS > - Waits for all regions to be assigned > - Some validation (unrelated) > - Cleanup: delete table. > {noformat} > cluster.killRegionServer(rs1.getRegionServer().getServerName()); > Threads.sleep(1000); // Let the kill soak in. > util.waitUntilAllRegionsAssigned(tableName); > LOG.info("All regions assigned"); > verifyMethodResult(SimpleRegionObserver.class, > new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", > "getCtPreWALRestore", > "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" }, > tableName, new Integer[] { 1, 1, 2, 2, 0, 0 }); > } finally { > util.deleteTable(tableName); > table.close(); > } > } > {noformat} > However, looking at test logs, found that we had overlapping Assigns with > Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout. > Assigns were from the ServerCrashRecovery and Unassigns were from the > deleteTable cleanup. > Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) > not wait until recovery was complete. > Answer: Looks like that function is only meant for sunny scenarios but not > for crashes. It iterates over meta and just [checks for *some value* in the > server > column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421] > which is obviously present and equal to the server that was just killed. > This bug must be affecting other fault tolerance tests too and fixing it may > fix more than just one test, hopefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19325) Pass a list of server name to postClearDeadServers
[ https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263830#comment-16263830 ] Hadoop QA commented on HBASE-19325: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 31s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 7s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 29s{color} | {color:green} branch-1 passed with JDK v1.8.0_141 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 20s{color} | {color:green} branch-1 passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 3s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 41s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 3m 12s{color} | {color:red} hbase-server in branch-1 has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 24s{color} | {color:green} branch-1 passed with JDK v1.8.0_141 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s{color} | {color:green} branch-1 passed with JDK v1.7.0_151 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 20s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 43s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 27s{color} | {color:green} the patch passed with JDK v1.8.0_141 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 27s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 22s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 22s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 2m 3s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 37s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 41m 0s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 3s{color} | {color:green} the patch passed with JDK v1.8.0_141 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s{color} | {color:green} the patch passed with JDK v1.7.0_151 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 26m 5s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 2s{color} | {color:green} hbase-rsgroup in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | |
[jira] [Commented] (HBASE-19319) Fix bug in synchronizing over ProcedureEvent
[ https://issues.apache.org/jira/browse/HBASE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263828#comment-16263828 ] Hadoop QA commented on HBASE-19319: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 35s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 5 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 22s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 27s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 6m 8s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 39s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 13s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 54s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 14s{color} | {color:red} hbase-procedure: The patch generated 3 new + 14 unchanged - 2 fixed = 17 total (was 16) {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 3s{color} | {color:green} hbase-server: The patch generated 0 new + 55 unchanged - 1 fixed = 55 total (was 56) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 1s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 57s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 54m 13s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 41s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 10s{color} | {color:green} hbase-procedure in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 96m 18s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 41s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}177m 40s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.regionserver.TestHRegionWithInMemoryFlush | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19319 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898978/HBASE-19319.master.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 9e9f334e579b 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven |
[jira] [Commented] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer
[ https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263826#comment-16263826 ] Hadoop QA commented on HBASE-16868: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 20s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 10 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 39s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 8s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 4m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 8m 37s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 3m 47s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 37s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 9m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} cc {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 5m 12s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 2m 3s{color} | {color:red} hbase-server: The patch generated 1 new + 217 unchanged - 6 fixed = 218 total (was 223) {color} | | {color:red}-1{color} | {color:red} rubocop {color} | {color:red} 0m 17s{color} | {color:red} The patch generated 9 new + 295 unchanged - 25 fixed = 304 total (was 320) {color} | | {color:red}-1{color} | {color:red} ruby-lint {color} | {color:red} 0m 16s{color} | {color:red} The patch generated 43 new + 315 unchanged - 28 fixed = 358 total (was 343) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 8m 22s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 99m 4s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} hbaseprotoc {color} | {color:green} 2m 20s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 44s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 35s{color} | {color:green} hbase-protocol-shaded in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 3m 3s{color} | {color:green} hbase-client in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 15s{color} | {color:green} hbase-replication in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green}120m 27s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 39s{color} | {color:green} hbase-shell in the patch passed. {color} | |
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263825#comment-16263825 ] ramkrishna.s.vasudevan commented on HBASE-16890: One quesiton on the YCSB - so since you measure write performance - the batching of mutations is disabled? I think only then we can measure the correct latency right? > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19159) Backup should check permission for snapshot copy in advance
[ https://issues.apache.org/jira/browse/HBASE-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263819#comment-16263819 ] Ted Yu commented on HBASE-19159: I took a brief look at how hadoop tests similar scenario. Please refer to: hadoop-hdfs-project/hadoop-hdfs/src/test//java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestAclWithSnapshot.java {code} private static void assertDirPermissionDenied(FileSystem fs, UserGroupInformation user, Path pathToCheck) throws Exception { try { fs.listStatus(pathToCheck); fail("expected AccessControlException for user " + user + ", path = " + pathToCheck); } catch (AccessControlException e) { {code} See if you can borrow something from the above test. > Backup should check permission for snapshot copy in advance > --- > > Key: HBASE-19159 > URL: https://issues.apache.org/jira/browse/HBASE-19159 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu >Assignee: Janos Gub >Priority: Minor > Attachments: initial_patch.txt > > > When the user running the backup doesn't have permission to copy the snapshot > , he / she would see: > {code} > 2017-11-02 18:21:33,654 ERROR [main] util.AbstractHBaseTool: Error running > command-line tool > org.apache.hadoop.hbase.snapshot.ExportSnapshotException: Failed to copy the > snapshot directory: > from=hdfs://ctr-e134-1499953498516-263664-01-03.hwx.site:8020/apps/hbase/data/.hbase-snapshot/snapshot_1509646891251_default_IntegrationTestBackupRestore.table2 > > to=hdfs://ctr-e134-1499953498516-263664-01-03.hwx.site:8020/user/root/test-data/fb919a6f-3cb4-4d57-bbcf-561d6e5b3ae8/backupIT/backup_1509646884252/default/IntegrationTestBackupRestore.table2/.hbase-snapshot/.tmp/snapshot_1509646891251_default_IntegrationTestBackupRestore.table2 > at > org.apache.hadoop.hbase.snapshot.ExportSnapshot.doWork(ExportSnapshot.java:1009) > at > org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:154) > at > org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:386) > at > org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.snapshotCopy(FullTableBackupClient.java:103) > at > org.apache.hadoop.hbase.backup.impl.FullTableBackupClient.execute(FullTableBackupClient.java:175) > at > org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:601) > at > org.apache.hadoop.hbase.IntegrationTestBackupRestore.runTest(IntegrationTestBackupRestore.java:180) > at > org.apache.hadoop.hbase.IntegrationTestBackupRestore.testBackupRestore(IntegrationTestBackupRestore.java:134) > at > org.apache.hadoop.hbase.IntegrationTestBackupRestore.runTestFromCommandLine(IntegrationTestBackupRestore.java:263) > {code} > It would be more user friendly if the permission is checked before taking the > snapshot. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19092) Make Tag IA.LimitedPrivate and expose for CPs
[ https://issues.apache.org/jira/browse/HBASE-19092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263817#comment-16263817 ] ramkrishna.s.vasudevan commented on HBASE-19092: Thanks for the reviews. Any other comments? I will commit it and then work on branch-2 patch. > Make Tag IA.LimitedPrivate and expose for CPs > - > > Key: HBASE-19092 > URL: https://issues.apache.org/jira/browse/HBASE-19092 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19092-branch-2.patch, > HBASE-19092-branch-2_5.patch, HBASE-19092-branch-2_5.patch, > HBASE-19092.branch-2.0.02.patch, HBASE-19092_001-branch-2.patch, > HBASE-19092_001.patch, HBASE-19092_002-branch-2.patch, HBASE-19092_002.patch, > HBASE-19092_004.patch, HBASE-19092_3.patch, HBASE-19092_4.patch > > > We need to make tags as LimitedPrivate as some use cases are trying to use > tags like timeline server. The same topic was discussed in dev@ and also in > HBASE-18995. > Shall we target this for beta1 - cc [~saint@gmail.com]. > So once we do this all related Util methods and APIs should also move to > LimitedPrivate Util classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19092) Make Tag IA.LimitedPrivate and expose for CPs
[ https://issues.apache.org/jira/browse/HBASE-19092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263816#comment-16263816 ] ramkrishna.s.vasudevan commented on HBASE-19092: Ya getType should be in RawCell but that is with the actual type and not the byte. so I thought those are unrelated to this JIRA's description. bq.Returning a RawCellBuilder sounds good. In it you would not allow option for setting sequenceid? Yes. Don't allow seqId to be set in the CP context. > Make Tag IA.LimitedPrivate and expose for CPs > - > > Key: HBASE-19092 > URL: https://issues.apache.org/jira/browse/HBASE-19092 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19092-branch-2.patch, > HBASE-19092-branch-2_5.patch, HBASE-19092-branch-2_5.patch, > HBASE-19092.branch-2.0.02.patch, HBASE-19092_001-branch-2.patch, > HBASE-19092_001.patch, HBASE-19092_002-branch-2.patch, HBASE-19092_002.patch, > HBASE-19092_004.patch, HBASE-19092_3.patch, HBASE-19092_4.patch > > > We need to make tags as LimitedPrivate as some use cases are trying to use > tags like timeline server. The same topic was discussed in dev@ and also in > HBASE-18995. > Shall we target this for beta1 - cc [~saint@gmail.com]. > So once we do this all related Util methods and APIs should also move to > LimitedPrivate Util classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19112) Suspect methods on Cell to be deprecated
[ https://issues.apache.org/jira/browse/HBASE-19112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263813#comment-16263813 ] ramkrishna.s.vasudevan commented on HBASE-19112: Since there is no assignee here. can I take this up as a follow up of RawCell that is being added in HBASE-19092? [~saint@gmail.com], [~chia7712]? > Suspect methods on Cell to be deprecated > > > Key: HBASE-19112 > URL: https://issues.apache.org/jira/browse/HBASE-19112 > Project: HBase > Issue Type: Bug > Components: Client >Reporter: Josh Elser >Priority: Blocker > Fix For: 2.0.0-beta-1 > > > [~chia7712] suggested on the [mailing > list|https://lists.apache.org/thread.html/e6de9af26d9b888a358ba48bf74655ccd893573087c032c0fcf01585@%3Cdev.hbase.apache.org%3E] > that we have some methods on Cell which should be deprecated for removal: > * {{#getType()}} > * {{#getTimestamp()}} > * {{#getTag()}} > * {{#getSequenceId()}} > Let's make a pass over these (and maybe the rest) to make sure that there > aren't others which are either implementation details or methods returning > now-private-marked classes. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19221) NoClassDefFoundError: org/hamcrest/SelfDescribing while running IT tests in 2.0-alpha4
[ https://issues.apache.org/jira/browse/HBASE-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263810#comment-16263810 ] ramkrishna.s.vasudevan commented on HBASE-19221: I don know. I need to check with the recent branch-2. But alpha4 had the issue. > NoClassDefFoundError: org/hamcrest/SelfDescribing while running IT tests in > 2.0-alpha4 > -- > > Key: HBASE-19221 > URL: https://issues.apache.org/jira/browse/HBASE-19221 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-alpha-3 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > > Copying the mail from the dev@ > {code} > I tried running some IT test cases using the alpha-4 RC. I found this issue > Exception in thread "main" java.lang.NoClassDefFoundError: > org/hamcrest/SelfDescribing > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:760) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:467) > at java.net.URLClassLoader.access$100(URLClassLoader.java:73) > at java.net.URLClassLoader$1.run(URLClassLoader.java:368) > at java.net.URLClassLoader$1.run(URLClassLoader.java:362) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:361) > ... >at > org.apache.hadoop.hbase.IntegrationTestsDriver.doWork(IntegrationTestsDriver.java:111) > at > org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:154) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.hadoop.hbase.IntegrationTestsDriver.main(IntegrationTestsDriver.java:47) > The same when run against latest master it runs without any issues > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263808#comment-16263808 ] Ted Yu commented on HBASE-19290: bq. The WAL split speed was stable at 0.2TB/minute The above is convincing. > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19318) MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase AccessController implementation
[ https://issues.apache.org/jira/browse/HBASE-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263807#comment-16263807 ] ramkrishna.s.vasudevan commented on HBASE-19318: Patch LGTM. So the intention is to see if the cluster has Accesscontrol services installed and if so Ranger CP will do necessary actions before the ACL CP is invoked, right? On this bq.confirmed for me that hbase:acl does somehow get created with Ranger Is it right to do this on Ranger side? > MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase > AccessController implementation > - > > Key: HBASE-19318 > URL: https://issues.apache.org/jira/browse/HBASE-19318 > Project: HBase > Issue Type: Bug > Components: master, security >Reporter: Sharmadha Sainath >Assignee: Josh Elser >Priority: Critical > Fix For: 1.4.0, 1.3.2, 1.2.7, 2.0.0-beta-1 > > Attachments: HBASE-19318.001.branch-2.patch > > > Sharmadha brought a failure to my attention trying to use Ranger with HBase > 2.0 where the {{grant}} command was erroring out unexpectedly. The cluster > had the Ranger-specific coprocessors deployed, per what was previously > working on the HBase 1.1 line. > After some digging, I found that the the Master is actually making a check > explicitly for a Coprocessor that has the name > {{org.apache.hadoop.hbase.security.access.AccessController}} (short name or > full name), instead of looking for a deployed coprocessor which can be > assigned to {{AccessController}} (which is what Ranger does). We have the > CoprocessorHost methods to do the latter already implemented; it strikes me > that we just accidentally used the wrong method in MasterRpcServices. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19335) Fix waitUntilAllRegionsAssigned
[ https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263803#comment-16263803 ] Appy commented on HBASE-19335: -- Running TestRegionObserverInterface on local machine took 84 sec (after the change). There are ~10 tests, each with 5 min individual timeout. Too much. The test class is labelled MediumTests, let's used that and our standard procedure - category based timeout. 3 min per test function should be enough even on slower Apache machines. Removing individual timeouts and using CategoryBasedTimeout. > Fix waitUntilAllRegionsAssigned > --- > > Key: HBASE-19335 > URL: https://issues.apache.org/jira/browse/HBASE-19335 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > > Found when debugging flaky test TestRegionObserverInterface#testRecovery. > In the end, the test does the following: > - Kills the RS > - Waits for all regions to be assigned > - Some validation (unrelated) > - Cleanup: delete table. > {noformat} > cluster.killRegionServer(rs1.getRegionServer().getServerName()); > Threads.sleep(1000); // Let the kill soak in. > util.waitUntilAllRegionsAssigned(tableName); > LOG.info("All regions assigned"); > verifyMethodResult(SimpleRegionObserver.class, > new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", > "getCtPreWALRestore", > "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" }, > tableName, new Integer[] { 1, 1, 2, 2, 0, 0 }); > } finally { > util.deleteTable(tableName); > table.close(); > } > } > {noformat} > However, looking at test logs, found that we had overlapping Assigns with > Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout. > Assigns were from the ServerCrashRecovery and Unassigns were from the > deleteTable cleanup. > Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) > not wait until recovery was complete. > Answer: Looks like that function is only meant for sunny scenarios but not > for crashes. It iterates over meta and just [checks for *some value* in the > server > column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421] > which is obviously present and equal to the server that was just killed. > This bug must be affecting other fault tolerance tests too and fixing it may > fix more than just one test, hopefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263801#comment-16263801 ] Hudson commented on HBASE-19332: FAILURE: Integrated in Jenkins build HBase-1.4 #1024 (See [https://builds.apache.org/job/HBase-1.4/1024/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev c9246588ec35aca5d89db98dba2b8d1fa38dfd31) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19300: --- Assignee: Ted Yu Status: Patch Available (was: Open) > TestMultithreadedTableMapper fails in branch-1.4 > > > Key: HBASE-19300 > URL: https://issues.apache.org/jira/browse/HBASE-19300 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Ted Yu > Attachments: 19300.branch-1.4.patch > > > From > https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/ > : > {code} > java.lang.AssertionError > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136) > {code} > I ran the test locally which failed. > Noticed the following in test output: > {code} > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. > But only got 0 results completely at client. Resetting the scanner to scan > again. > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > java.io.EOFException: Partial cell read > at > org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86) > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70) > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Premature EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202) > at org.apache.hadoop.hbase.KeyValueUtil.iscreate(KeyValueUtil.java:611) > at >
[jira] [Updated] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-19300: --- Attachment: 19300.branch-1.4.patch This patch allows the test to pass. > TestMultithreadedTableMapper fails in branch-1.4 > > > Key: HBASE-19300 > URL: https://issues.apache.org/jira/browse/HBASE-19300 > Project: HBase > Issue Type: Test >Reporter: Ted Yu > Attachments: 19300.branch-1.4.patch > > > From > https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/ > : > {code} > java.lang.AssertionError > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136) > {code} > I ran the test locally which failed. > Noticed the following in test output: > {code} > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. > But only got 0 results completely at client. Resetting the scanner to scan > again. > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > java.io.EOFException: Partial cell read > at > org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86) > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70) > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Premature EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202) > at org.apache.hadoop.hbase.KeyValueUtil.iscreate(KeyValueUtil.java:611) > at >
[jira] [Commented] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263796#comment-16263796 ] Ted Yu commented on HBASE-19300: As far as I can tell, synchronizing on outer (the context) is correct: {code} public void run(Context context) throws IOException, InterruptedException { outer = context; {code} I don't know why error-prone flagged {{synchronized (outer)}} > TestMultithreadedTableMapper fails in branch-1.4 > > > Key: HBASE-19300 > URL: https://issues.apache.org/jira/browse/HBASE-19300 > Project: HBase > Issue Type: Test >Reporter: Ted Yu > > From > https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/ > : > {code} > java.lang.AssertionError > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136) > {code} > I ran the test locally which failed. > Noticed the following in test output: > {code} > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. > But only got 0 results completely at client. Resetting the scanner to scan > again. > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > java.io.EOFException: Partial cell read > at > org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86) > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70) > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Premature EOF from inputStream > at
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263797#comment-16263797 ] ramkrishna.s.vasudevan commented on HBASE-16890: I think I need to repeat my test here. I checked the old comemnts and found one thing that was missing in recent tests. Increase the number of cols. will do that once now and report back. I think that was the difference. The YCSB reports suggests that through put decrease though very small compared to FSHLog. That seems to be opposite to what I got. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19300) TestMultithreadedTableMapper fails in branch-1.4
[ https://issues.apache.org/jira/browse/HBASE-19300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263793#comment-16263793 ] Ted Yu commented on HBASE-19300: After a brief bisect (cutting the patch in half and shrinking the number of files touched), I narrowed down to the changes in MultithreadedTableMapper.java. With the changes, the test times out. > TestMultithreadedTableMapper fails in branch-1.4 > > > Key: HBASE-19300 > URL: https://issues.apache.org/jira/browse/HBASE-19300 > Project: HBase > Issue Type: Test >Reporter: Ted Yu > > From > https://builds.apache.org/job/HBase-1.4/1023/jdk=JDK_1_7,label=Hadoop&&!H13/testReport/org.apache.hadoop.hbase.mapreduce/TestMultithreadedTableMapper/testMultithreadedTableMapper/ > : > {code} > java.lang.AssertionError > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.verify(TestMultithreadedTableMapper.java:195) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.runTestOnTable(TestMultithreadedTableMapper.java:163) > at > org.apache.hadoop.hbase.mapreduce.TestMultithreadedTableMapper.testMultithreadedTableMapper(TestMultithreadedTableMapper.java:136) > {code} > I ran the test locally which failed. > Noticed the following in test output: > {code} > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t9] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:13,929 ERROR [hconnection-0x11db8653-shared--pool24-t3] > protobuf.ResponseConverter(425): Results sent from server=703. But only got 0 > results completely atclient. Resetting the scanner to scan again. > 2017-11-18 19:28:14,461 ERROR [hconnection-0x11db8653-shared--pool24-t8] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > org.apache.hadoop.hbase.DoNotRetryIOException: Results sent from server=703. > But only got 0 results completely at client. Resetting the scanner to scan > again. > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:426) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > 2017-11-18 19:28:14,464 ERROR [hconnection-0x11db8653-shared--pool24-t2] > protobuf.ResponseConverter(432): Exception while reading cells from > result.Resetting the scanner toscan again. > java.io.EOFException: Partial cell read > at > org.apache.hadoop.hbase.codec.BaseDecoder.rethrowEofException(BaseDecoder.java:86) > at org.apache.hadoop.hbase.codec.BaseDecoder.advance(BaseDecoder.java:70) > at > org.apache.hadoop.hbase.protobuf.ResponseConverter.getResults(ResponseConverter.java:419) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:284) > at > org.apache.hadoop.hbase.client.ScannerCallable.call(ScannerCallable.java:62) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:219) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:388) > at > org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.call(ScannerCallableWithReplicas.java:362) > at > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:142) > at > org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Premature EOF from inputStream > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:202) > at
[jira] [Commented] (HBASE-19317) Increase "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" to avoid host-related failures on MiniMRCluster
[ https://issues.apache.org/jira/browse/HBASE-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263791#comment-16263791 ] Hudson commented on HBASE-19317: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4101 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4101/]) HBASE-19317 Set a high NodeManager max disk utilization if not already (elserj: rev 6f0c9fbfd1f17f5f50d90464866d153286b051a5) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java > Increase > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > to avoid host-related failures on MiniMRCluster > > > Key: HBASE-19317 > URL: https://issues.apache.org/jira/browse/HBASE-19317 > Project: HBase > Issue Type: Bug > Components: integration tests, test >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19317.001.branch-2.patch, > HBASE-19317.002.branch-2.patch > > > YARN (2.7.4, at least) defaults to asserting at least 10% of the disk usage > free on the local machine in order for the NodeManagers to function. > On my development machine, despite having over 50G free, I would see the > warning from the NM that all the local dirs were bad which would cause the > test to become stuck waiting to submit a mapreduce job. Surefire would > eventually kill the process. > We should increase this value to avoid it causing us headache. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19310) Verify IntegrationTests don't rely on Rules outside of JUnit context
[ https://issues.apache.org/jira/browse/HBASE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263790#comment-16263790 ] Hudson commented on HBASE-19310: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4101 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4101/]) HBASE-19310 Avoid an NPE IntegrationTestImportTsv when outside of the (elserj: rev b0b606429339aabe9fb964af6bf3c3129b3ac375) * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java > Verify IntegrationTests don't rely on Rules outside of JUnit context > > > Key: HBASE-19310 > URL: https://issues.apache.org/jira/browse/HBASE-19310 > Project: HBase > Issue Type: Bug > Components: integration tests >Reporter: Romil Choksi >Assignee: Josh Elser >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19310.001.branch-2.patch, > HBASE-19310.002.branch-2.patch > > > {noformat} > 2017-11-16 00:43:41,204 INFO [main] mapreduce.IntegrationTestImportTsv: > Running test testGenerateAndLoad. > Exception in thread "main" java.lang.NullPointerException > at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:461) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.testGenerateAndLoad(IntegrationTestImportTsv.java:189) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.run(IntegrationTestImportTsv.java:229) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.main(IntegrationTestImportTsv.java:239) > {noformat} > (Potential line-number skew) > {code} > @Test > public void testGenerateAndLoad() throws Exception { > LOG.info("Running test testGenerateAndLoad."); > final TableName table = TableName.valueOf(name.getMethodName()); > {code} > The JUnit framework sets the test method name inside of the JUnit {{Rule}}. > When we invoke the test directly (ala {{hbase > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv}}), this > {{getMethodName()}} returns {{null}} and we get the above stacktrace. > Should make a pass over the ITs with main methods and {{Rule}}'s to make sure > we don't have this lurking. Another alternative is to just remove the main > methods and just force use of {{IntegrationTestsDriver}} instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263792#comment-16263792 ] Hudson commented on HBASE-19332: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4101 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4101/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev cdc2bb17ff38dcbd273cf501aea565006e995a06) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263773#comment-16263773 ] Yu Li commented on HBASE-19290: --- Let me try to add more information: Once when upgrading the HDFS version, NN had some fencing problem and causing all RS aborted one by one. And after HDFS restored and HBase cluster restarted, we observed Master threads waiting on zookeeper to return: {noformat} "MASTER_SERVER_OPERATIONS-hdpet2mainsem2:60100-28"#2236 prio=5 os_prio=0 tid=0x7ff526bad800 nid=0xa890 in Object.wait() [0x7ff5150f6000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342) - locked <0x0005d9c720d0> (a org.apache.zookeeper.ClientCnxn$Packet) at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1470) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295) at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:635) at org.apache.hadoop.hbase.coordination.ZKSplitLogManagerCoordination.remainingTasksInCoordination(ZKSplitLogManagerCoordination.java:150) at org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:353) - locked <0x0006440826e8> (a org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch) at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:274) {noformat} And after investigation we found the root cause is splitWAL znode contains too many children and the {{getChildren}} call is too time-consuming. After some further discussion on how to resolve the issue, we think the most efficient way is to reduce the speed of publishing split tasks, or say only publish when there's available WAL splitter. Publishing the task aggressively could help nothing but slowing down the {{getChildren}} operation on splitWAL thus the whole world. After the patched version went online, we encountered another disaster case (unfortunately...) and experienced no more zk contention problem. The WAL split speed was stable at 0.2TB/minute So we don't have any performance testing result, but theory proved by observation from real world, and hope this is convincing (smile). > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19317) Increase "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" to avoid host-related failures on MiniMRCluster
[ https://issues.apache.org/jira/browse/HBASE-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263766#comment-16263766 ] Hudson commented on HBASE-19317: FAILURE: Integrated in Jenkins build HBase-2.0 #899 (See [https://builds.apache.org/job/HBase-2.0/899/]) HBASE-19317 Set a high NodeManager max disk utilization if not already (elserj: rev 4e387a948fad9928c9d4922c9055e601d22e4145) * (edit) hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java > Increase > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > to avoid host-related failures on MiniMRCluster > > > Key: HBASE-19317 > URL: https://issues.apache.org/jira/browse/HBASE-19317 > Project: HBase > Issue Type: Bug > Components: integration tests, test >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19317.001.branch-2.patch, > HBASE-19317.002.branch-2.patch > > > YARN (2.7.4, at least) defaults to asserting at least 10% of the disk usage > free on the local machine in order for the NodeManagers to function. > On my development machine, despite having over 50G free, I would see the > warning from the NM that all the local dirs were bad which would cause the > test to become stuck waiting to submit a mapreduce job. Surefire would > eventually kill the process. > We should increase this value to avoid it causing us headache. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19310) Verify IntegrationTests don't rely on Rules outside of JUnit context
[ https://issues.apache.org/jira/browse/HBASE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263765#comment-16263765 ] Hudson commented on HBASE-19310: FAILURE: Integrated in Jenkins build HBase-2.0 #899 (See [https://builds.apache.org/job/HBase-2.0/899/]) HBASE-19310 Avoid an NPE IntegrationTestImportTsv when outside of the (elserj: rev 46cb5d598689577b01cc7690587ae94579b70a11) * (edit) hbase-it/src/test/java/org/apache/hadoop/hbase/mapreduce/IntegrationTestImportTsv.java > Verify IntegrationTests don't rely on Rules outside of JUnit context > > > Key: HBASE-19310 > URL: https://issues.apache.org/jira/browse/HBASE-19310 > Project: HBase > Issue Type: Bug > Components: integration tests >Reporter: Romil Choksi >Assignee: Josh Elser >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19310.001.branch-2.patch, > HBASE-19310.002.branch-2.patch > > > {noformat} > 2017-11-16 00:43:41,204 INFO [main] mapreduce.IntegrationTestImportTsv: > Running test testGenerateAndLoad. > Exception in thread "main" java.lang.NullPointerException > at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:461) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.testGenerateAndLoad(IntegrationTestImportTsv.java:189) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.run(IntegrationTestImportTsv.java:229) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.main(IntegrationTestImportTsv.java:239) > {noformat} > (Potential line-number skew) > {code} > @Test > public void testGenerateAndLoad() throws Exception { > LOG.info("Running test testGenerateAndLoad."); > final TableName table = TableName.valueOf(name.getMethodName()); > {code} > The JUnit framework sets the test method name inside of the JUnit {{Rule}}. > When we invoke the test directly (ala {{hbase > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv}}), this > {{getMethodName()}} returns {{null}} and we get the above stacktrace. > Should make a pass over the ITs with main methods and {{Rule}}'s to make sure > we don't have this lurking. Another alternative is to just remove the main > methods and just force use of {{IntegrationTestsDriver}} instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263767#comment-16263767 ] Hudson commented on HBASE-19332: FAILURE: Integrated in Jenkins build HBase-2.0 #899 (See [https://builds.apache.org/job/HBase-2.0/899/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 135bb5583b44f207e20f2e2caf0d109903f817d4) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263760#comment-16263760 ] binlijin commented on HBASE-19290: -- {quote} The patch is adding throttling, so it's certainly changing the way things work. It's also adding the 'if' condition controlling throttling, so the choice is definitely being made by you. I suspect you are using grabbedTask=0 as a proxy for 'failed to grab task' and wait on it. But when grabbedTask =1, and we still keep failing to grab tasks, there is no throttling for that case! Hopefully that makes my question clearer? {quote} But when grabbedTask =1, and we still keep failing to grab tasks, it will end the for loop and enter while (seq_start == taskReadySeq.get()) {} loop, do this have any problem? > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263753#comment-16263753 ] Hudson commented on HBASE-19332: FAILURE: Integrated in Jenkins build HBase-1.5 #166 (See [https://builds.apache.org/job/HBase-1.5/166/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 20d811121fb38ea2fc3871dcf4b03593bd4d6b7e) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19325) Pass a list of server name to postClearDeadServers
[ https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guangxu Cheng updated HBASE-19325: -- Attachment: HBASE-19325.branch-1.001.patch upload branch-1 patch > Pass a list of server name to postClearDeadServers > -- > > Key: HBASE-19325 > URL: https://issues.apache.org/jira/browse/HBASE-19325 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-2 >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng > Attachments: HBASE-19325.branch-1.001.patch, > HBASE-19325.branch-2.001.patch > > > Over on the tail of HBASE-18131. [~chia7712] said > {quote} > (Revisiting the AccessController remind me of this issue) > Could we remove the duplicate code on the server side? Why not pass a list of > server name to postClearDeadServers and postListDeadServers? > {quote} > The duplicate code has been removed in HBASE-19131.Now Pass a list of server > name to postClearDeadServers -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263748#comment-16263748 ] binlijin edited comment on HBASE-19290 at 11/23/17 3:27 AM: bq. The above example lasted for almost an hour. bq. With the patch, roughly how long does log splitting task last ? We do not record the new numbers and the log do not exists now. But we record that we have 7.1TB wals and split it in 40mins. was (Author: aoxiang): bq. The above example lasted for almost an hour. With the patch, roughly how long does log splitting task last ? We do not record the new numbers and the log do not exists now. But we record that we have 7.1TB wals and split it in 40mins. > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263748#comment-16263748 ] binlijin commented on HBASE-19290: -- bq. The above example lasted for almost an hour. With the patch, roughly how long does log splitting task last ? We do not record the new numbers and the log do not exists now. But we record that we have 7.1TB wals and split it in 40mins. > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-18309) Support multi threads in CleanerChore
[ https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Li updated HBASE-18309: -- Hadoop Flags: Reviewed Release Note: After HBASE-18309 we could use multiple threads to scan archive directories (including data and oldWALs) through config hbase.cleaner.scan.dir.concurrent.size, which supports both integer (meaning the concrete size) and double (between 0 and 1, meaning ratio of available cpu cores) value and defaults to 0.5. Please take hbase.regionserver.hfilecleaner.large.thread.count and hbase.regionserver.hfilecleaner.small.thread.count into account when setting this config to avoid thread flooding. We also support using multiple threads to clean wals in a single directory through hbase.oldwals.cleaner.thread.size, 2 by default. Fix Version/s: 2.0.0-beta-1 3.0.0 Description: There is only one thread in LogCleaner to clean oldWALs and in our big cluster we find this is not enough. The number of files under oldWALs reach the max-directory-items limit of HDFS and cause region server crash, so we use multi threads for LogCleaner and the crash not happened any more. What's more, currently there's only one thread iterating the archive directory, and we could use multiple threads cleaning sub directories in parallel to speed it up. was:There is only one thread in LogCleaner to clean oldWALs and in our big cluster we find this is not enough. The number of files under oldWALs reach the max-directory-items limit of HDFS and cause region server crash, so we use multi threads for LogCleaner and the crash not happened any more. Component/s: (was: wal) [~reidchan] please check the release note and feel free to refine it. It's recommended to add release note when introducing new properties, so people could better know how to use them. > Support multi threads in CleanerChore > - > > Key: HBASE-18309 > URL: https://issues.apache.org/jira/browse/HBASE-18309 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: Reid Chan > Fix For: 3.0.0, 2.0.0-beta-1 > > Attachments: HBASE-18309.master.001.patch, > HBASE-18309.master.002.patch, HBASE-18309.master.004.patch, > HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, > HBASE-18309.master.007.patch, HBASE-18309.master.008.patch, > HBASE-18309.master.009.patch, HBASE-18309.master.010.patch, > HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, > space_consumption_in_archive.png > > > There is only one thread in LogCleaner to clean oldWALs and in our big > cluster we find this is not enough. The number of files under oldWALs reach > the max-directory-items limit of HDFS and cause region server crash, so we > use multi threads for LogCleaner and the crash not happened any more. > What's more, currently there's only one thread iterating the archive > directory, and we could use multiple threads cleaning sub directories in > parallel to speed it up. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263742#comment-16263742 ] Ted Yu commented on HBASE-19290: Lijin: The above example lasted for almost an hour. With the patch, roughly how long does log splitting task last ? Thanks > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19093) Check Admin/Table to ensure all operations go via AccessControl
[ https://issues.apache.org/jira/browse/HBASE-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263741#comment-16263741 ] Anoop Sam John commented on HBASE-19093: bq.If we add a new method to MasterRpcServices, but don't add pre/post methods to MasterObserver. So it will still miss the ACL check? Good point. Wanted to come to this jira and check attached patch but missed in btw some thing else. I have a doubt on the general approach. The issue is when we add new client functions (say adding Quota things), there is chances that we miss the ACL checks. It is not normally seen like hook are added around the ops but missed impl in AC. Infact most of the time the AC is the prompting factor for adding hooks. We cleaned up some hooks recently which were exposing too many internal stuff to CPs (Around procedure, locks) . All those hooks were designed so as to do some AC checks. So the problem is mostly the other way around compared to what the patch is trying to do. Not sure how we can add a test for that. > Check Admin/Table to ensure all operations go via AccessControl > --- > > Key: HBASE-19093 > URL: https://issues.apache.org/jira/browse/HBASE-19093 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Balazs Meszaros >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19093.master.001.patch, > HBASE-19093.master.002.patch, RegionObserver.txt > > > A cursory review of Admin Interface has a bunch of methods as open, with out > AccessControl checks. For example, procedure executor has not check on it. > This issue is about given the Admin and Table Interfaces a once-over to see > what is missing and to fill in access control where missing. > This is a follow-on from work over in HBASE-19048 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263738#comment-16263738 ] Duo Zhang commented on HBASE-16890: --- The ops metrics are 81435 vs. 77108, FSHLog is 5% more but the run time is almost the same? The unit is milliseconds, so for a 10 mins run the diff is only 4ms while there is 5% ops differ? And the seconds is even stranger, 393 vs. 458, AsyncFSWAL is 16% more but the run is still, only 3s for a 10 mins run, 0.5% differ? Could you please explain more on the results? Thanks. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263728#comment-16263728 ] Anoop Sam John commented on HBASE-16890: So as per ur tests the higher percentile latency values are better with async wal. But the avg latency and the throughput is on bit lower side. The test Ram did was also around throughput only. Initial time it was like async wal throughput on lower side. Later his tests did not say so. I dont know the details of these later tests. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263729#comment-16263729 ] binlijin commented on HBASE-19290: -- We observe that when the cluster is big, regionserver issue too much zookeeper request to get availableRSs from rsZNode and also getTaskList from splitLogZNode. Have more nodes, the get availableRSs is more heavy. > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-18309) Support multi threads in CleanerChore
[ https://issues.apache.org/jira/browse/HBASE-18309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263723#comment-16263723 ] Yu Li commented on HBASE-18309: --- Ok, let me commit this one. [~stack] I'm planning to commit this one into branch-2 too if you don't mind boss. Thanks. > Support multi threads in CleanerChore > - > > Key: HBASE-18309 > URL: https://issues.apache.org/jira/browse/HBASE-18309 > Project: HBase > Issue Type: Improvement > Components: wal >Reporter: binlijin >Assignee: Reid Chan > Attachments: HBASE-18309.master.001.patch, > HBASE-18309.master.002.patch, HBASE-18309.master.004.patch, > HBASE-18309.master.005.patch, HBASE-18309.master.006.patch, > HBASE-18309.master.007.patch, HBASE-18309.master.008.patch, > HBASE-18309.master.009.patch, HBASE-18309.master.010.patch, > HBASE-18309.master.011.patch, HBASE-18309.master.012.patch, > space_consumption_in_archive.png > > > There is only one thread in LogCleaner to clean oldWALs and in our big > cluster we find this is not enough. The number of files under oldWALs reach > the max-directory-items limit of HDFS and cause region server crash, so we > use multi threads for LogCleaner and the crash not happened any more. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263720#comment-16263720 ] binlijin commented on HBASE-19290: -- [~tedyu] bq. Assuming patch v3 is very close to the version you run in the 2000+ node production cluster, can you post some performance numbers (in terms of reduction in zookeeper requests) so that we can know its effectiveness ? We do not record the performance numbers. But without the patch we can see the HMaster get the zookeeper event very very slowly... HMaster put up split task: *2017-07-11 20:22:57,608* DEBUG [main-EventThread] coordination.SplitLogManagerCoordination: put up splitlog task at znode /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548 RegionServer grab the task and done it. *2017-07-11 20:23:33,689* INFO [SplitLogWorker-hadoop1435:16020] coordination.ZkSplitLogWorkerCoordination: worker hadoop1435.et2.tbsite.net,16020,1495647366458 acquired task /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548 *2017-07-11 20:25:47,131* INFO [RS_LOG_REPLAY_OPS-hadoop1435:16020-1] coordination.ZkSplitLogWorkerCoordination: successfully transitioned task /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548 to final state DONE hadoop1435.et2.tbsite.net,16020,1495647366458 HMaster get the task done event and delete it: *2017-07-11 20:49:52,879* INFO [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548 entered state: DONE hadoop1435.et2.tbsite.net,16020,1495647366458 *2017-07-11 20:49:52,881* INFO [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548 *2017-07-11 21:19:52,280* DEBUG [main-EventThread] coordination.ZKSplitLogManagerCoordination$DeleteAsyncCallback: deleted /hbase/splitWAL/WALs%2Fhadoop0448.et2.tbsite.net%2C16020%2C1495647366007-splitting%2Fhadoop0448.et2.tbsite.net%252C16020%252C1495647366007.regiongroup-2.1499768090548 > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection
[ https://issues.apache.org/jira/browse/HBASE-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang reassigned HBASE-19334: -- Assignee: Guanghao Zhang > User.runAsLoginUser not work in AccessController because it use a short > circuited connection > > > Key: HBASE-19334 > URL: https://issues.apache.org/jira/browse/HBASE-19334 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > > The short-circuited connection will bypass the RPC and the RPC context didn't > change. So it still use the old RPC user to write ACL table and > User.runAsLoginUser not work. > AccessController's grant method. > {code} > User.runAsLoginUser(new PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > // regionEnv is set at #start. Hopefully not null at this point. > try (Table table = regionEnv.getConnection(). > getTable(AccessControlLists.ACL_TABLE_NAME)) { > > AccessControlLists.addUserPermission(regionEnv.getConfiguration(), perm, > table, > request.getMergeExistingPermissions()); > } > return null; > } > }); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19093) Check Admin/Table to ensure all operations go via AccessControl
[ https://issues.apache.org/jira/browse/HBASE-19093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263719#comment-16263719 ] Guanghao Zhang commented on HBASE-19093: If we add a new method to MasterRpcServices, but don't add pre/post methods to MasterObserver. So it will still miss the ACL check? > Check Admin/Table to ensure all operations go via AccessControl > --- > > Key: HBASE-19093 > URL: https://issues.apache.org/jira/browse/HBASE-19093 > Project: HBase > Issue Type: Sub-task >Reporter: stack >Assignee: Balazs Meszaros >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19093.master.001.patch, > HBASE-19093.master.002.patch, RegionObserver.txt > > > A cursory review of Admin Interface has a bunch of methods as open, with out > AccessControl checks. For example, procedure executor has not check on it. > This issue is about given the Admin and Table Interfaces a once-over to see > what is missing and to fill in access control where missing. > This is a follow-on from work over in HBASE-19048 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263710#comment-16263710 ] Hudson commented on HBASE-19332: SUCCESS: Integrated in Jenkins build HBase-1.3-IT #295 (See [https://builds.apache.org/job/HBase-1.3-IT/295/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 276052be990e439d74d9d7871e1242dd1b1c8de7) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263709#comment-16263709 ] Hudson commented on HBASE-19332: FAILURE: Integrated in Jenkins build HBase-1.3-JDK7 #355 (See [https://builds.apache.org/job/HBase-1.3-JDK7/355/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 276052be990e439d74d9d7871e1242dd1b1c8de7) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263707#comment-16263707 ] Hudson commented on HBASE-19332: FAILURE: Integrated in Jenkins build HBase-1.3-JDK8 #375 (See [https://builds.apache.org/job/HBase-1.3-JDK8/375/]) HBASE-19332 DumpReplicationQueues misreports total WAL size (garyh: rev 276052be990e439d74d9d7871e1242dd1b1c8de7) * (edit) hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/DumpReplicationQueues.java > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gary Helmling updated HBASE-19332: -- Resolution: Fixed Status: Resolved (was: Patch Available) Pushed to branch-1.3+. Thanks for review [~tedyu]. > DumpReplicationQueues misreports total WAL size > --- > > Key: HBASE-19332 > URL: https://issues.apache.org/jira/browse/HBASE-19332 > Project: HBase > Issue Type: Bug > Components: Replication >Affects Versions: 1.3.1 >Reporter: Gary Helmling >Assignee: Gary Helmling >Priority: Trivial > Fix For: 2.0.0, 3.0.0, 1.3.2 > > Attachments: HBASE-19332.patch > > > DumpReplicationQueues uses an int to collect the total WAL size for a queue. > Predictably, this overflows much of the time. Let's use a long instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263699#comment-16263699 ] binlijin edited comment on HBASE-19290 at 11/23/17 2:32 AM: [~appy] bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve the code further was appropriate and made sense. bq. Here's the diff on what he was suggesting (and what i was thinking earlier). Done in HBASE-19290.master.004.patch . was (Author: aoxiang): [~appy] bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve the code further was appropriate and made sense. Here's the diff on what he was suggesting (and what i was thinking earlier). Done in HBASE-19290.master.004.patch . > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19335) Fix waitUntilAllRegionsAssigned
[ https://issues.apache.org/jira/browse/HBASE-19335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263701#comment-16263701 ] Appy commented on HBASE-19335: -- I have a patch which i'll upload shortly. > Fix waitUntilAllRegionsAssigned > --- > > Key: HBASE-19335 > URL: https://issues.apache.org/jira/browse/HBASE-19335 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > > Found when debugging flaky test TestRegionObserverInterface#testRecovery. > In the end, the test does the following: > - Kills the RS > - Waits for all regions to be assigned > - Some validation (unrelated) > - Cleanup: delete table. > {noformat} > cluster.killRegionServer(rs1.getRegionServer().getServerName()); > Threads.sleep(1000); // Let the kill soak in. > util.waitUntilAllRegionsAssigned(tableName); > LOG.info("All regions assigned"); > verifyMethodResult(SimpleRegionObserver.class, > new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", > "getCtPreWALRestore", > "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" }, > tableName, new Integer[] { 1, 1, 2, 2, 0, 0 }); > } finally { > util.deleteTable(tableName); > table.close(); > } > } > {noformat} > However, looking at test logs, found that we had overlapping Assigns with > Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout. > Assigns were from the ServerCrashRecovery and Unassigns were from the > deleteTable cleanup. > Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) > not wait until recovery was complete. > Answer: Looks like that function is only meant for sunny scenarios but not > for crashes. It iterates over meta and just [checks for *some value* in the > server > column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421] > which is obviously present and equal to the server that was just killed. > This bug must be affecting other fault tolerance tests too and fixing it may > fix more than just one test, hopefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19335) Fix waitUntilAllRegionsAssigned
Appy created HBASE-19335: Summary: Fix waitUntilAllRegionsAssigned Key: HBASE-19335 URL: https://issues.apache.org/jira/browse/HBASE-19335 Project: HBase Issue Type: Bug Reporter: Appy Assignee: Appy Found when debugging flaky test TestRegionObserverInterface#testRecovery. In the end, the test does the following: - Kills the RS - Waits for all regions to be assigned - Some validation (unrelated) - Cleanup: delete table. {noformat} cluster.killRegionServer(rs1.getRegionServer().getServerName()); Threads.sleep(1000); // Let the kill soak in. util.waitUntilAllRegionsAssigned(tableName); LOG.info("All regions assigned"); verifyMethodResult(SimpleRegionObserver.class, new String[] { "getCtPreReplayWALs", "getCtPostReplayWALs", "getCtPreWALRestore", "getCtPostWALRestore", "getCtPrePut", "getCtPostPut" }, tableName, new Integer[] { 1, 1, 2, 2, 0, 0 }); } finally { util.deleteTable(tableName); table.close(); } } {noformat} However, looking at test logs, found that we had overlapping Assigns with Unassigns. As a result, regions ended up 'stuck in RIT' and the test timeout. Assigns were from the ServerCrashRecovery and Unassigns were from the deleteTable cleanup. Which begs the question, why did HBTU.waitUntilAllRegionsAssigned(tableName) not wait until recovery was complete. Answer: Looks like that function is only meant for sunny scenarios but not for crashes. It iterates over meta and just [checks for *some value* in the server column|https://github.com/apache/hbase/blob/cdc2bb17ff38dcbd273cf501aea565006e995a06/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java#L3421] which is obviously present and equal to the server that was just killed. This bug must be affecting other fault tolerance tests too and fixing it may fix more than just one test, hopefully. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-19290: - Attachment: HBASE-19290.master.004.patch > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch, > HBASE-19290.master.004.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19290) Reduce zk request when doing split log
[ https://issues.apache.org/jira/browse/HBASE-19290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263699#comment-16263699 ] binlijin commented on HBASE-19290: -- [~appy] bq. Yeah, they are definitely clear, but Jingyun Tian's suggestion to improve the code further was appropriate and made sense. Here's the diff on what he was suggesting (and what i was thinking earlier). Done in HBASE-19290.master.004.patch . > Reduce zk request when doing split log > -- > > Key: HBASE-19290 > URL: https://issues.apache.org/jira/browse/HBASE-19290 > Project: HBase > Issue Type: Improvement >Reporter: binlijin >Assignee: binlijin > Attachments: HBASE-19290.master.001.patch, > HBASE-19290.master.002.patch, HBASE-19290.master.003.patch > > > We observe once the cluster has 1000+ nodes and when hundreds of nodes abort > and doing split log, the split is very very slow, and we find the > regionserver and master wait on the zookeeper response, so we need to reduce > zookeeper request and pressure for big cluster. > (1) Reduce request to rsZNode, every time calculateAvailableSplitters will > get rsZNode's children from zookeeper, when cluster is huge, this is heavy. > This patch reduce the request. > (2) When the regionserver has max split tasks running, it may still trying to > grab task and issue zookeeper request, we should sleep and wait until we can > grab tasks again. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19035) Miss metrics when coprocessor use region scanner to read data
[ https://issues.apache.org/jira/browse/HBASE-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263689#comment-16263689 ] Guanghao Zhang commented on HBASE-19035: The branch-1 HADOOP QA build timeout See https://builds.apache.org/job/PreCommit-HBASE-Build/9963/console It take 6 hours to run the hbase-server test, then timeout. 09:19:11 cd /testptch/hbase/hbase-server 09:19:11 mvn -Dmaven.repo.local=/home/jenkins/yetus-m2/hbase-branch-1-patch-1 -DHBasePatchProcess -PrunAllTests -Dtest.exclude.pattern=**/TestClassFinder.java,**/client.TestGet.java,**/master.cleaner.TestReplicationZKNodeCleaner.java,**/snapshot.TestExportSnapshot.java,**/master.TestAssignmentManagerMetrics.java,**/client.TestShell.java,**/master.assignment.TestAssignmentManager.java,**/master.assignment.TestMergeTableRegionsProcedure.java,**/client.TestAsyncTableGetMultiThreaded.java,**/security.visibility.TestVisibilityLabelsOnNewVersionBehaviorTable.java,**/master.balancer.TestFavoredStochasticLoadBalancer.java,**/client.TestBlockEvictionFromClient.java,**/security.access.TestCoprocessorWhitelistMasterObserver.java,**/master.TestRollingRestart.java,**/client.TestTableSnapshotScanner.java,**/client.TestAsyncTableScanAll.java,**/rsgroup.TestRSGroups.java,**/quotas.TestMasterSpaceQuotaObserver.java,**/replication.TestReplicationKillSlaveRS.java,**/replication.TestReplicationDroppedTables.java,**/quotas.TestQuotaThrottle.java,**/client.TestReplicasClient.java,**/snapshot.TestMobRestoreFlushSnapshotFromClient.java,**/client.locking.TestEntityLocks.java,**/client.TestScannersFromClientSide.java,**/quotas.TestSpaceQuotasWithSnapshots.java,**/client.TestMobSnapshotCloneIndependence.java,**/client.TestReplicaWithCluster.java,**/quotas.TestQuotaAdmin.java,**/TestCheckTestClasses.java,**/master.procedure.TestEnableTableProcedure.java,**/regionserver.TestSplitTransactionOnCluster.java,**/client.TestMultiParallel.java,**/client.TestSizeFailures.java,**/client.TestRestoreSnapshotFromClientWithRegionReplicas.java,**/client.TestAdmin2.java,**/regionserver.TestHRegion.java,**/master.procedure.TestTruncateTableProcedure.java,**/security.visibility.TestVisibilityLabelsWithACL.java,**/master.TestWarmupRegion.java,**/snapshot.TestSecureExportSnapshot.java,**/io.encoding.TestLoadAndSwitchEncodeOnDisk.java,**/master.procedure.TestServerCrashProcedure.java,**/client.replication.TestReplicationAdminWithClusters.java,**/client.TestHCM.java,**/client.replication.TestReplicationAdminWithTwoDifferentZKClusters.java,**/TestJMXListener.java,**/trace.TestHTraceHooks.java,**/replication.TestReplicationSyncUpTool.java,**/client.TestMultiRespectsLimits.java,**/regionserver.TestCompactionInDeadRegionServer.java,**/client.TestAsyncTableAdminApi.java,**/snapshot.TestMobSecureExportSnapshot.java,**/replication.TestMasterReplication.java,**/client.TestAsyncSnapshotAdminApi.java,**/master.assignment.TestAssignmentOnRSCrash.java,**/regionserver.wal.TestAsyncLogRolling.java,**/replication.TestReplicationSmallTests.java,**/snapshot.TestMobFlushSnapshotFromClient.java,**/quotas.TestSnapshotQuotaObserverChore.java,**/TestAcidGuarantees.java,**/master.assignment.TestSplitTableRegionProcedure.java,**/replication.regionserver.TestTableBasedReplicationSourceManagerImpl.java,**/TestZooKeeper.java,**/fs.TestBlockReorder.java,**/client.TestCloneSnapshotFromClient.java,**/security.token.TestTokenAuthentication.java,**/coprocessor.TestRegionObserverInterface.java,**/regionserver.TestFSErrorsExposed.java,**/client.TestMetaWithReplicas.java,**/client.TestFromClientSideWithCoprocessor.java,**/master.TestDistributedLogSplitting.java,**/TestServerSideScanMetricsFromClientSide.java,**/regionserver.TestPerColumnFamilyFlush.java,**/client.TestMobCloneSnapshotFromClient.java,**/TestRegionRebalancing.java,**/security.visibility.TestVisibilityLabelsWithDeletes.java,**/master.procedure.TestMasterFailoverWithProcedures.java,**/master.cleaner.TestHFileCleaner.java clean test -fae > /testptch/patchprocess/patch-unit-hbase-server.txt 2>&1 15:30:49 Build timed out (after 420 minutes). Marking the build as failed. 15:30:50 Build was aborted > Miss metrics when coprocessor use region scanner to read data > - > > Key: HBASE-19035 > URL: https://issues.apache.org/jira/browse/HBASE-19035 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19035.branch-1.001.patch, > HBASE-19035.branch-1.patch, HBASE-19035.branch-1.patch, > HBASE-19035.branch-1.patch, HBASE-19035.branch-1.patch, > HBASE-19035.master.001.patch, HBASE-19035.master.002.patch, > HBASE-19035.master.003.patch, HBASE-19035.master.003.patch > > > Region interface is exposed to coprocessor. So coprocessor use getScanner to > get a
[jira] [Updated] (HBASE-19319) Fix bug in synchronizing over ProcedureEvent
[ https://issues.apache.org/jira/browse/HBASE-19319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Appy updated HBASE-19319: - Attachment: HBASE-19319.master.002.patch > Fix bug in synchronizing over ProcedureEvent > > > Key: HBASE-19319 > URL: https://issues.apache.org/jira/browse/HBASE-19319 > Project: HBase > Issue Type: Bug >Reporter: Appy >Assignee: Appy > Attachments: HBASE-19319.master.001.patch, > HBASE-19319.master.002.patch > > > Following synchronizes over local variable rather than the original > ProcedureEvent object. Clearly a bug since this code block won't follow > exclusion with many of the synchronized methods in ProcedureEvent class. > {code} > @Override > public void wakeEvents(final int count, final ProcedureEvent... events) { > final boolean traceEnabled = LOG.isTraceEnabled(); > schedLock(); > try { > int waitingCount = 0; > for (int i = 0; i < count; ++i) { > final ProcedureEvent event = events[i]; > synchronized (event) { > if (!event.isReady()) { > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19325) Pass a list of server name to postClearDeadServers
[ https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263652#comment-16263652 ] Guangxu Cheng commented on HBASE-19325: --- bq.Don't we need to preserve / provide the same method signature ? I will submit the branch-1 patch later.Thanks > Pass a list of server name to postClearDeadServers > -- > > Key: HBASE-19325 > URL: https://issues.apache.org/jira/browse/HBASE-19325 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-2 >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng > Attachments: HBASE-19325.branch-2.001.patch > > > Over on the tail of HBASE-18131. [~chia7712] said > {quote} > (Revisiting the AccessController remind me of this issue) > Could we remove the duplicate code on the server side? Why not pass a list of > server name to postClearDeadServers and postListDeadServers? > {quote} > The duplicate code has been removed in HBASE-19131.Now Pass a list of server > name to postClearDeadServers -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19325) Pass a list of server name to postClearDeadServers
[ https://issues.apache.org/jira/browse/HBASE-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263651#comment-16263651 ] Guangxu Cheng commented on HBASE-19325: --- bq.What about passing the servers coming from request to preClearDeadServers? preClearDeadServers is called only in AccessController, and does not use the variable deadservers. So, I do not think it is necessary to pass dead servers to preClearDeadServers.WDYT?Thanks > Pass a list of server name to postClearDeadServers > -- > > Key: HBASE-19325 > URL: https://issues.apache.org/jira/browse/HBASE-19325 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0-beta-2 >Reporter: Guangxu Cheng >Assignee: Guangxu Cheng > Attachments: HBASE-19325.branch-2.001.patch > > > Over on the tail of HBASE-18131. [~chia7712] said > {quote} > (Revisiting the AccessController remind me of this issue) > Could we remove the duplicate code on the server side? Why not pass a list of > server name to postClearDeadServers and postListDeadServers? > {quote} > The duplicate code has been removed in HBASE-19131.Now Pass a list of server > name to postClearDeadServers -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19330) Remove duplicated dependency from hbase-rest
[ https://issues.apache.org/jira/browse/HBASE-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263629#comment-16263629 ] Hudson commented on HBASE-19330: FAILURE: Integrated in Jenkins build HBase-Trunk_matrix #4100 (See [https://builds.apache.org/job/HBase-Trunk_matrix/4100/]) HBASE-19330 Remove duplicated dependency from hbase-rest (stack: rev 548ebbc574021ca22ba92633678d4f0cec70be0d) * (edit) hbase-rest/pom.xml > Remove duplicated dependency from hbase-rest > > > Key: HBASE-19330 > URL: https://issues.apache.org/jira/browse/HBASE-19330 > Project: HBase > Issue Type: Bug > Components: dependencies >Affects Versions: 3.0.0, 2.0.0-alpha-4 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Trivial > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19330.master.001.patch, > HBASE-19330.master.001.patch > > > In hbase-rest module hbase-hadoop-compat dependency is listed twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19318) MasterRpcServices#getSecurityCapabilities explicitly checks for the HBase AccessController implementation
[ https://issues.apache.org/jira/browse/HBASE-19318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263605#comment-16263605 ] Hadoop QA commented on HBASE-19318: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 12s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-2 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 14s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 48s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s{color} | {color:green} branch-2 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 50s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 30s{color} | {color:green} branch-2 passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 9s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 9s{color} | {color:red} hbase-server: The patch generated 2 new + 16 unchanged - 0 fixed = 18 total (was 16) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 31s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 46m 26s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 82m 6s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 13s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:9f2f2db | | JIRA Issue | HBASE-19318 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898946/HBASE-19318.001.branch-2.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 7a6e54b327de 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 12:48:20 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | branch-2 / 0ef7a24245 | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/9981/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/9981/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/9981/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was
[jira] [Commented] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer
[ https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263602#comment-16263602 ] Guanghao Zhang commented on HBASE-16868: TestAsyncReplicationAdminApiWithClusters was added recently. Attach a 011 patch to fix it. > Add a replicate_all flag to avoid misuse the namespaces and table-cfs config > of replication peer > > > Key: HBASE-16868 > URL: https://issues.apache.org/jira/browse/HBASE-16868 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0, 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-16868.master.001.patch, > HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, > HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, > HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, > HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, > HBASE-16868.master.010.patch, HBASE-16868.master.011.patch > > > First add a new peer by shell cmd. > {code} > add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase". > {code} > If we don't set namespaces and table cfs in peer config. It means replicate > all tables to the peer cluster. > Then append a table to the peer config. > {code} > append_peer_tableCFs '1', {"table1" => []} > {code} > Then this peer will only replicate table1 to the peer cluster. It changes to > replicate only one table from replicate all tables in the cluster. It is very > easy to misuse in production cluster. So we should avoid appending table to a > peer which replicates all table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-16868) Add a replicate_all flag to avoid misuse the namespaces and table-cfs config of replication peer
[ https://issues.apache.org/jira/browse/HBASE-16868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-16868: --- Attachment: HBASE-16868.master.011.patch > Add a replicate_all flag to avoid misuse the namespaces and table-cfs config > of replication peer > > > Key: HBASE-16868 > URL: https://issues.apache.org/jira/browse/HBASE-16868 > Project: HBase > Issue Type: Improvement > Components: Replication >Affects Versions: 2.0.0, 3.0.0 >Reporter: Guanghao Zhang >Assignee: Guanghao Zhang >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-16868.master.001.patch, > HBASE-16868.master.002.patch, HBASE-16868.master.003.patch, > HBASE-16868.master.004.patch, HBASE-16868.master.005.patch, > HBASE-16868.master.006.patch, HBASE-16868.master.007.patch, > HBASE-16868.master.008.patch, HBASE-16868.master.009.patch, > HBASE-16868.master.010.patch, HBASE-16868.master.011.patch > > > First add a new peer by shell cmd. > {code} > add_peer '1', CLUSTER_KEY => "server1.cie.com:2181:/hbase". > {code} > If we don't set namespaces and table cfs in peer config. It means replicate > all tables to the peer cluster. > Then append a table to the peer config. > {code} > append_peer_tableCFs '1', {"table1" => []} > {code} > Then this peer will only replicate table1 to the peer cluster. It changes to > replicate only one table from replicate all tables in the cluster. It is very > easy to misuse in production cluster. So we should avoid appending table to a > peer which replicates all table. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
[ https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263593#comment-16263593 ] Ted Yu commented on HBASE-19333: bq. Make sure it does not come off as 'instruction' or a 'command'. I read description again and don't which part constitutes a command. > Consider exposing ExportSnapshot#getSnapshotFiles through POJO class > > > Key: HBASE-19333 > URL: https://issues.apache.org/jira/browse/HBASE-19333 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > > In the thread, > http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 > , Timothy mentioned that he used reflection to get to > ExportSnapshot#getSnapshotFiles(). > {code} > private static List> getSnapshotFiles(final > Configuration conf, > final FileSystem fs, final Path snapshotDir) throws IOException { > {code} > SnapshotFileInfo is protobuf. > We should consider exposing the API by replacing the protobuf class with POJO > class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations
[ https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263578#comment-16263578 ] Guanghao Zhang commented on HBASE-19301: bq. we need to add doc that it will be a short-circuited connection Yes, doc it clearly will be better. :-) > Provide way for CPs to create short circuited connection with custom > configurations > --- > > Key: HBASE-19301 > URL: https://issues.apache.org/jira/browse/HBASE-19301 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, > HBASE-19301_V2.patch > > > Over in HBASE-18359 we have discussions for this. > Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But > this returns a pre created connection (per server). This uses the configs at > hbase-site.xml at that server. > Phoenix needs creating connection in CP with some custom configs. Having this > custom changes in hbase-site.xml is harmful as that will affect all > connections been created at that server. > This issue is for providing an overloaded getConnection(Configuration) API -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations
[ https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263576#comment-16263576 ] Guanghao Zhang commented on HBASE-19301: Open a HBASE-19334 for the ACL problem. The ACL problem is a old problem which I left a TODO for it. I misunderstood this issue and thought this may resolve the ACL problem... So I left comment here. Now we can continue discuss in HBASE-19334. > Provide way for CPs to create short circuited connection with custom > configurations > --- > > Key: HBASE-19301 > URL: https://issues.apache.org/jira/browse/HBASE-19301 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, > HBASE-19301_V2.patch > > > Over in HBASE-18359 we have discussions for this. > Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But > this returns a pre created connection (per server). This uses the configs at > hbase-site.xml at that server. > Phoenix needs creating connection in CP with some custom configs. Having this > custom changes in hbase-site.xml is harmful as that will affect all > connections been created at that server. > This issue is for providing an overloaded getConnection(Configuration) API -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations
[ https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263569#comment-16263569 ] stack commented on HBASE-19301: --- [~zghaobac] Ok. Thanks. Your issue that we need to add doc that it will be a short-circuited connection and that if you do not want that, you need to create your own outside of the CpEnv offerings is a good comment. > Provide way for CPs to create short circuited connection with custom > configurations > --- > > Key: HBASE-19301 > URL: https://issues.apache.org/jira/browse/HBASE-19301 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, > HBASE-19301_V2.patch > > > Over in HBASE-18359 we have discussions for this. > Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But > this returns a pre created connection (per server). This uses the configs at > hbase-site.xml at that server. > Phoenix needs creating connection in CP with some custom configs. Having this > custom changes in hbase-site.xml is harmful as that will affect all > connections been created at that server. > This issue is for providing an overloaded getConnection(Configuration) API -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19323) Make netty engine default in hbase2
[ https://issues.apache.org/jira/browse/HBASE-19323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263564#comment-16263564 ] Hadoop QA commented on HBASE-19323: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 14s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 55s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 58s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 7m 16s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 44s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 55s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 14s{color} | {color:red} hbase-server: The patch generated 1 new + 8 unchanged - 0 fixed = 9 total (was 8) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 9s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 69m 24s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 45s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}124m 44s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 21s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}218m 57s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19323 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898932/HBASE-19323.master.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux dc093b52dabb 3.13.0-133-generic #182-Ubuntu SMP Tue Sep 19 15:49:21 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 194efe3e5a | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/9980/artifact/patchprocess/diff-checkstyle-hbase-server.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/9980/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/9980/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically
[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
[ https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263562#comment-16263562 ] stack commented on HBASE-19333: --- bq. HBASE-15762 Consider hbase-client to be shaded by default in 2.0 Yeah, bad idea there too. bq. I looked for Tim's handle on JIRA. The first one was from fellowshipvillage. The second one has different spelling in last name. You do it on the mailing list out in public so those addressed (or those watching) understand that a key attributes of our community, ones we'd like to talk-up, are encouraged participation and inclusion, and that there is no need for a mediator filing or fixing issues in our project. Be careful too how you make your suggestion. Make sure it does not come off as 'instruction' or a 'command'. bq. I just felt using reflection is not something that would be accepted in the open source version. If a proper tool, would not need to expose files. > Consider exposing ExportSnapshot#getSnapshotFiles through POJO class > > > Key: HBASE-19333 > URL: https://issues.apache.org/jira/browse/HBASE-19333 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > > In the thread, > http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 > , Timothy mentioned that he used reflection to get to > ExportSnapshot#getSnapshotFiles(). > {code} > private static List> getSnapshotFiles(final > Configuration conf, > final FileSystem fs, final Path snapshotDir) throws IOException { > {code} > SnapshotFileInfo is protobuf. > We should consider exposing the API by replacing the protobuf class with POJO > class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
[ https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263549#comment-16263549 ] Ted Yu commented on HBASE-19333: bq. Why file on behalf of others? I looked for Tim's handle on JIRA. The first one was from fellowshipvillage. The second one has different spelling in last name. bq. Why not encourage them to file their own issues? I have no problem changing reporter to Tim. Will request the person to log JIRA in the future. I will also remind people who don't follow this practice. bq. the problem is a cleaning tool True. Open sourcing the tool is another matter. I just felt using reflection is not something that would be accepted in the open source version. > Consider exposing ExportSnapshot#getSnapshotFiles through POJO class > > > Key: HBASE-19333 > URL: https://issues.apache.org/jira/browse/HBASE-19333 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > > In the thread, > http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 > , Timothy mentioned that he used reflection to get to > ExportSnapshot#getSnapshotFiles(). > {code} > private static List> getSnapshotFiles(final > Configuration conf, > final FileSystem fs, final Path snapshotDir) throws IOException { > {code} > SnapshotFileInfo is protobuf. > We should consider exposing the API by replacing the protobuf class with POJO > class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection
[ https://issues.apache.org/jira/browse/HBASE-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-19334: --- Description: The short-circuited connection will bypass the RPC and the RPC context didn't change. So it still use the old RPC user to write ACL table and User.runAsLoginUser not work. AccessController's grant method. {code} User.runAsLoginUser(new PrivilegedExceptionAction() { @Override public Void run() throws Exception { // regionEnv is set at #start. Hopefully not null at this point. try (Table table = regionEnv.getConnection(). getTable(AccessControlLists.ACL_TABLE_NAME)) { AccessControlLists.addUserPermission(regionEnv.getConfiguration(), perm, table, request.getMergeExistingPermissions()); } return null; } }); {code} was:The short-circuited connection will bypass the RPC and the RPC context didn't change. So it still use the old RPC user to write ACL table and User.runAsLoginUser not work. > User.runAsLoginUser not work in AccessController because it use a short > circuited connection > > > Key: HBASE-19334 > URL: https://issues.apache.org/jira/browse/HBASE-19334 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang > > The short-circuited connection will bypass the RPC and the RPC context didn't > change. So it still use the old RPC user to write ACL table and > User.runAsLoginUser not work. > AccessController's grant method. > {code} > User.runAsLoginUser(new PrivilegedExceptionAction() { > @Override > public Void run() throws Exception { > // regionEnv is set at #start. Hopefully not null at this point. > try (Table table = regionEnv.getConnection(). > getTable(AccessControlLists.ACL_TABLE_NAME)) { > > AccessControlLists.addUserPermission(regionEnv.getConfiguration(), perm, > table, > request.getMergeExistingPermissions()); > } > return null; > } > }); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19330) Remove duplicated dependency from hbase-rest
[ https://issues.apache.org/jira/browse/HBASE-19330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263546#comment-16263546 ] Hudson commented on HBASE-19330: FAILURE: Integrated in Jenkins build HBase-2.0 #898 (See [https://builds.apache.org/job/HBase-2.0/898/]) HBASE-19330 Remove duplicated dependency from hbase-rest (stack: rev 0ef7a24245359ada37473b05e6cca4aad46b5225) * (edit) hbase-rest/pom.xml > Remove duplicated dependency from hbase-rest > > > Key: HBASE-19330 > URL: https://issues.apache.org/jira/browse/HBASE-19330 > Project: HBase > Issue Type: Bug > Components: dependencies >Affects Versions: 3.0.0, 2.0.0-alpha-4 >Reporter: Peter Somogyi >Assignee: Peter Somogyi >Priority: Trivial > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19330.master.001.patch, > HBASE-19330.master.001.patch > > > In hbase-rest module hbase-hadoop-compat dependency is listed twice. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
[ https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263543#comment-16263543 ] Ted Yu commented on HBASE-19333: bq. And whats with the 'Consider'? Why file issue an issue for a 'Consideration'? I am confused. There have been precedents. e.g. HBASE-15762 Consider hbase-client to be shaded by default in 2.0 > Consider exposing ExportSnapshot#getSnapshotFiles through POJO class > > > Key: HBASE-19333 > URL: https://issues.apache.org/jira/browse/HBASE-19333 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > > In the thread, > http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 > , Timothy mentioned that he used reflection to get to > ExportSnapshot#getSnapshotFiles(). > {code} > private static List> getSnapshotFiles(final > Configuration conf, > final FileSystem fs, final Path snapshotDir) throws IOException { > {code} > SnapshotFileInfo is protobuf. > We should consider exposing the API by replacing the protobuf class with POJO > class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection
[ https://issues.apache.org/jira/browse/HBASE-19334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guanghao Zhang updated HBASE-19334: --- Description: The short-circuited connection will bypass the RPC and the RPC context didn't change. So it still use the old RPC user to write ACL table and User.runAsLoginUser not work. > User.runAsLoginUser not work in AccessController because it use a short > circuited connection > > > Key: HBASE-19334 > URL: https://issues.apache.org/jira/browse/HBASE-19334 > Project: HBase > Issue Type: Bug >Reporter: Guanghao Zhang > > The short-circuited connection will bypass the RPC and the RPC context didn't > change. So it still use the old RPC user to write ACL table and > User.runAsLoginUser not work. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (HBASE-19334) User.runAsLoginUser not work in AccessController because it use a short circuited connection
Guanghao Zhang created HBASE-19334: -- Summary: User.runAsLoginUser not work in AccessController because it use a short circuited connection Key: HBASE-19334 URL: https://issues.apache.org/jira/browse/HBASE-19334 Project: HBase Issue Type: Bug Reporter: Guanghao Zhang -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19301) Provide way for CPs to create short circuited connection with custom configurations
[ https://issues.apache.org/jira/browse/HBASE-19301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263536#comment-16263536 ] Guanghao Zhang commented on HBASE-19301: Sorry, sir. This issue is great to resolve what the subjects said. I comment here because I want get some feedback from [~anoop.hbase]. I should open a new issue to discuss the ACL problem... I think this can be resolve later. > Provide way for CPs to create short circuited connection with custom > configurations > --- > > Key: HBASE-19301 > URL: https://issues.apache.org/jira/browse/HBASE-19301 > Project: HBase > Issue Type: Sub-task > Components: Coprocessors >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19301.patch, HBASE-19301_V2.patch, > HBASE-19301_V2.patch > > > Over in HBASE-18359 we have discussions for this. > Right now HBase provide getConnection() in RegionCPEnv, MasterCPEnv etc. But > this returns a pre created connection (per server). This uses the configs at > hbase-site.xml at that server. > Phoenix needs creating connection in CP with some custom configs. Having this > custom changes in hbase-site.xml is harmful as that will affect all > connections been created at that server. > This issue is for providing an overloaded getConnection(Configuration) API -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263521#comment-16263521 ] Duo Zhang commented on HBASE-16890: --- BTW what do you mean by ‘completely asynchronous’? > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-16890) Analyze the performance of AsyncWAL and fix the same
[ https://issues.apache.org/jira/browse/HBASE-16890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263513#comment-16263513 ] Duo Zhang commented on HBASE-16890: --- MVCC is assigned before calling consumer. It is an optimization which is done by [~carp84] . And what is your config for the ycsb test? Thanks. > Analyze the performance of AsyncWAL and fix the same > > > Key: HBASE-16890 > URL: https://issues.apache.org/jira/browse/HBASE-16890 > Project: HBase > Issue Type: Sub-task > Components: wal >Affects Versions: 2.0.0 >Reporter: ramkrishna.s.vasudevan >Assignee: ramkrishna.s.vasudevan >Priority: Blocker > Fix For: 2.0.0-beta-1 > > Attachments: AsyncWAL_disruptor.patch, AsyncWAL_disruptor_1 > (2).patch, AsyncWAL_disruptor_3.patch, AsyncWAL_disruptor_3.patch, > AsyncWAL_disruptor_4.patch, AsyncWAL_disruptor_6.patch, > HBASE-16890-rc-v2.patch, HBASE-16890-rc-v3.patch, > HBASE-16890-remove-contention-v1.patch, HBASE-16890-remove-contention.patch, > Screen Shot 2016-10-25 at 7.34.47 PM.png, Screen Shot 2016-10-25 at 7.39.07 > PM.png, Screen Shot 2016-10-25 at 7.39.48 PM.png, Screen Shot 2016-11-04 at > 5.21.27 PM.png, Screen Shot 2016-11-04 at 5.30.18 PM.png, async.svg, > classic.svg, contention.png, contention_defaultWAL.png, > ycsb_FSHlog.vs.Async.png > > > Tests reveal that AsyncWAL under load in single node cluster performs slower > than the Default WAL. This task is to analyze and see if we could fix it. > See some discussions in the tail of JIRA HBASE-15536. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-19329) hbase regionserver log output error (quota)
[ https://issues.apache.org/jira/browse/HBASE-19329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser resolved HBASE-19329. Resolution: Invalid Please use the user mailing list for help in debugging your system. JIRA is not the place for this. https://hbase.apache.org/mail-lists.html > hbase regionserver log output error (quota) > > > Key: HBASE-19329 > URL: https://issues.apache.org/jira/browse/HBASE-19329 > Project: HBase > Issue Type: Bug >Affects Versions: 1.3.1 >Reporter: gehaijiang > > 2017-11-16 02:50:33,474 WARN > [blackstone064030,16020,1510632966258_ChoreService_1] quotas.QuotaCache: > Unable to read user from quota table > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 3 > actions: Table 'hbase:quota' was not found, got: hbase:namespace.: 3 times, > servers with issues: null > , > at > org.apache.hadoop.hbase.quotas.QuotaTableUtil.doGet(QuotaTableUtil.java:330) > at > org.apache.hadoop.hbase.quotas.QuotaUtil.fetchUserQuotas(QuotaUtil.java:155) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore$3.fetchEntries(QuotaCache.java:256) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetch(QuotaCache.java:290) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetchUserQuotaState(QuotaCache.java:248) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:213) > 2017-11-16 02:55:33,453 WARN > [blackstone064030,16020,1510632966258_ChoreService_1] quotas.QuotaCache: > Unable to read namespace from quota table > org.apache.hadoop.hbase.TableNotFoundException: Table 'hbase:quota' was not > found, got: hbase:namespace. > at > org.apache.hadoop.hbase.quotas.QuotaTableUtil.doGet(QuotaTableUtil.java:330) > at > org.apache.hadoop.hbase.quotas.QuotaUtil.fetchGlobalQuotas(QuotaUtil.java:220) > at > org.apache.hadoop.hbase.quotas.QuotaUtil.fetchNamespaceQuotas(QuotaUtil.java:207) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore$1.fetchEntries(QuotaCache.java:226) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetch(QuotaCache.java:290) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.fetchNamespaceQuotaState(QuotaCache.java:218) > at > org.apache.hadoop.hbase.quotas.QuotaCache$QuotaRefresherChore.chore(QuotaCache.java:211) > 2017-11-16 02:55:33,488 WARN > [blackstone064030,16020,1510632966258_ChoreService_1] quotas.QuotaCache: > Unable to read table from quota table > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed > 47 actions: Table 'hbase:quota' was not found, got: hbase:namespace.: 47 > times, servers with issues: nu > ll, -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
[ https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-19333. --- Resolution: Incomplete Resolving as incomplete/invalid. > Consider exposing ExportSnapshot#getSnapshotFiles through POJO class > > > Key: HBASE-19333 > URL: https://issues.apache.org/jira/browse/HBASE-19333 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > > In the thread, > http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 > , Timothy mentioned that he used reflection to get to > ExportSnapshot#getSnapshotFiles(). > {code} > private static List> getSnapshotFiles(final > Configuration conf, > final FileSystem fs, final Path snapshotDir) throws IOException { > {code} > SnapshotFileInfo is protobuf. > We should consider exposing the API by replacing the protobuf class with POJO > class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
[ https://issues.apache.org/jira/browse/HBASE-19333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263488#comment-16263488 ] stack commented on HBASE-19333: --- Why file on behalf of others? Why not encourage them to file their own issues? Besides, the problem is a cleaning tool... Not " exposing ExportSnapshot#getSnapshotFiles through POJO class". And whats with the 'Consider'? Why file issue an issue for a 'Consideration'? Filing issues should be more than 'Considerations'. in fact, let me close this as ill-specified. > Consider exposing ExportSnapshot#getSnapshotFiles through POJO class > > > Key: HBASE-19333 > URL: https://issues.apache.org/jira/browse/HBASE-19333 > Project: HBase > Issue Type: Improvement >Reporter: Ted Yu > > In the thread, > http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 > , Timothy mentioned that he used reflection to get to > ExportSnapshot#getSnapshotFiles(). > {code} > private static List> getSnapshotFiles(final > Configuration conf, > final FileSystem fs, final Path snapshotDir) throws IOException { > {code} > SnapshotFileInfo is protobuf. > We should consider exposing the API by replacing the protobuf class with POJO > class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19266) TestAcidGuarantees should cover adaptive in-memory compaction
[ https://issues.apache.org/jira/browse/HBASE-19266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263474#comment-16263474 ] Ted Yu commented on HBASE-19266: TestAcidGuaranteesWithBasicPoli didn't finish in the QA run. This might be due to test environment. > TestAcidGuarantees should cover adaptive in-memory compaction > - > > Key: HBASE-19266 > URL: https://issues.apache.org/jira/browse/HBASE-19266 > Project: HBase > Issue Type: Test >Reporter: Ted Yu >Assignee: Chia-Ping Tsai >Priority: Minor > Attachments: HBASE-19266.v0.patch > > > Currently TestAcidGuarantees populates 3 policies of (in-memory) compaction. > Adaptive in-memory compaction is new and should be added as 4th compaction > policy. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19317) Increase "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" to avoid host-related failures on MiniMRCluster
[ https://issues.apache.org/jira/browse/HBASE-19317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-19317: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks, Ted. > Increase > "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" > to avoid host-related failures on MiniMRCluster > > > Key: HBASE-19317 > URL: https://issues.apache.org/jira/browse/HBASE-19317 > Project: HBase > Issue Type: Bug > Components: integration tests, test >Reporter: Josh Elser >Assignee: Josh Elser > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19317.001.branch-2.patch, > HBASE-19317.002.branch-2.patch > > > YARN (2.7.4, at least) defaults to asserting at least 10% of the disk usage > free on the local machine in order for the NodeManagers to function. > On my development machine, despite having over 50G free, I would see the > warning from the NM that all the local dirs were bad which would cause the > test to become stuck waiting to submit a mapreduce job. Surefire would > eventually kill the process. > We should increase this value to avoid it causing us headache. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HBASE-19310) Verify IntegrationTests don't rely on Rules outside of JUnit context
[ https://issues.apache.org/jira/browse/HBASE-19310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-19310: --- Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Thanks Ted and Stack. > Verify IntegrationTests don't rely on Rules outside of JUnit context > > > Key: HBASE-19310 > URL: https://issues.apache.org/jira/browse/HBASE-19310 > Project: HBase > Issue Type: Bug > Components: integration tests >Reporter: Romil Choksi >Assignee: Josh Elser >Priority: Critical > Fix For: 2.0.0-beta-1 > > Attachments: HBASE-19310.001.branch-2.patch, > HBASE-19310.002.branch-2.patch > > > {noformat} > 2017-11-16 00:43:41,204 INFO [main] mapreduce.IntegrationTestImportTsv: > Running test testGenerateAndLoad. > Exception in thread "main" java.lang.NullPointerException > at org.apache.hadoop.hbase.TableName.valueOf(TableName.java:461) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.testGenerateAndLoad(IntegrationTestImportTsv.java:189) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.run(IntegrationTestImportTsv.java:229) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv.main(IntegrationTestImportTsv.java:239) > {noformat} > (Potential line-number skew) > {code} > @Test > public void testGenerateAndLoad() throws Exception { > LOG.info("Running test testGenerateAndLoad."); > final TableName table = TableName.valueOf(name.getMethodName()); > {code} > The JUnit framework sets the test method name inside of the JUnit {{Rule}}. > When we invoke the test directly (ala {{hbase > org.apache.hadoop.hbase.mapreduce.IntegrationTestImportTsv}}), this > {{getMethodName()}} returns {{null}} and we get the above stacktrace. > Should make a pass over the ITs with main methods and {{Rule}}'s to make sure > we don't have this lurking. Another alternative is to just remove the main > methods and just force use of {{IntegrationTestsDriver}} instead. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19163) "Maximum lock count exceeded" from region server's batch processing
[ https://issues.apache.org/jira/browse/HBASE-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263451#comment-16263451 ] stack commented on HBASE-19163: --- Man. Great find [~huaxiang] Thats bad. Critical yes. Did you get chance to file issue on it sir? On patch, LGTM. Nice. Was wondering about this bit: 5652} catch (Error error) { 5653 // The maximum lock count for read lock is 64K (hardcoded), when this maximum count 5654 // is reached, it will throw out an Error. This Error needs to be caught so it can 5655 // go ahead to process the minibatch with lock acquired. 5656 IOException ioe = new IOException(); 5657 ioe.initCause(error); 5658 TraceUtil.addTimelineAnnotation("Error getting row lock"); 5659 throw ioe; The Error could be anything. It could be > 64k locks. It could be an OOME. I suppose no harm catching it and trying to press on persisting the batch. Do we log that entered this clause? Might be useful to add if not when trying to debug an odd issue. If we saw this log a few lines up, we'd know we were already in dire straits. Thanks. > "Maximum lock count exceeded" from region server's batch processing > --- > > Key: HBASE-19163 > URL: https://issues.apache.org/jira/browse/HBASE-19163 > Project: HBase > Issue Type: Bug > Components: regionserver >Affects Versions: 3.0.0, 1.2.7, 2.0.0-alpha-3 >Reporter: huaxiang sun >Assignee: huaxiang sun > Attachments: HBASE-19163-master-v001.patch, > HBASE-19163.master.001.patch, HBASE-19163.master.002.patch, > HBASE-19163.master.004.patch, HBASE-19163.master.005.patch, > HBASE-19163.master.006.patch, unittest-case.diff > > > In one of use cases, we found the following exception and replication is > stuck. > {code} > 2017-10-25 19:41:17,199 WARN [hconnection-0x28db294f-shared--pool4-t936] > client.AsyncProcess: #3, table=foo, attempt=5/5 failed=262836ops, last > exception: java.io.IOException: java.io.IOException: Maximum lock count > exceeded > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2215) > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:109) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:185) > at > org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:165) > Caused by: java.lang.Error: Maximum lock count exceeded > at > java.util.concurrent.locks.ReentrantReadWriteLock$Sync.fullTryAcquireShared(ReentrantReadWriteLock.java:528) > at > java.util.concurrent.locks.ReentrantReadWriteLock$Sync.tryAcquireShared(ReentrantReadWriteLock.java:488) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1327) > at > java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871) > at > org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:5163) > at > org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:3018) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2877) > at > org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2819) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:753) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:715) > at > org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2148) > at > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33656) > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2170) > ... 3 more > {code} > While we are still examining the data pattern, it is sure that there are too > many mutations in the batch against the same row, this exceeds the maximum > 64k shared lock count and it throws an error and failed the whole batch. > There are two approaches to solve this issue. > 1). Let's say there are mutations against the same row in the batch, we just > need to acquire the lock once for the same row vs to acquire the lock for > each mutation. > 2). We catch the error and start to process whatever it gets and loop back. > With HBASE-17924, approach 1 seems easy to implement now. > Create the jira and will post update/patch when investigation moving forward. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19332) DumpReplicationQueues misreports total WAL size
[ https://issues.apache.org/jira/browse/HBASE-19332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263450#comment-16263450 ] Hadoop QA commented on HBASE-19332: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 15s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 37s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 9s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 24s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 12s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 52s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 23s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 47m 48s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5 2.7.1 2.7.2 2.7.3 2.7.4 or 3.0.0-alpha4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 6s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 20s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}145m 7s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:eee3b01 | | JIRA Issue | HBASE-19332 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12898929/HBASE-19332.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 91b396108c96 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / 194efe3e5a | | maven | version: Apache Maven 3.5.2 (138edd61fd100ec658bfa2d307c43b76940a5d7d; 2017-10-18T07:58:13Z) | | Default Java | 1.8.0_151 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/9979/testReport/ | | modules | C: hbase-server U: hbase-server | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/9979/console | | Powered by | Apache Yetus 0.6.0 http://yetus.apache.org | This message was automatically generated. > DumpReplicationQueues misreports total WAL size >
[jira] [Created] (HBASE-19333) Consider exposing ExportSnapshot#getSnapshotFiles through POJO class
Ted Yu created HBASE-19333: -- Summary: Consider exposing ExportSnapshot#getSnapshotFiles through POJO class Key: HBASE-19333 URL: https://issues.apache.org/jira/browse/HBASE-19333 Project: HBase Issue Type: Improvement Reporter: Ted Yu In the thread, http://search-hadoop.com/m/HBase/YGbbUxY9FyU74X?subj=Re+Deleting+and+cleaning+old+snapshots+exported+to+S3 , Timothy mentioned that he used reflection to get to ExportSnapshot#getSnapshotFiles(). {code} private static List> getSnapshotFiles(final Configuration conf, final FileSystem fs, final Path snapshotDir) throws IOException { {code} SnapshotFileInfo is protobuf. We should consider exposing the API by replacing the protobuf class with POJO class. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19204) branch-1.2 times out and is taking 6-7 hours to complete
[ https://issues.apache.org/jira/browse/HBASE-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263422#comment-16263422 ] Xiao Chen commented on HBASE-19204: --- Hi Stack, The surefire tarball was what I could consistently reproducing the jvm issue, with the provided command. My reproduction was in a docker container running ubuntu (1404 iirc), but I think it should reproduce in any env with 7u151 openjdk. Not sure what the fix (or the exact jvm bug) is besides '7u161 doesn't have it!' :) > branch-1.2 times out and is taking 6-7 hours to complete > > > Key: HBASE-19204 > URL: https://issues.apache.org/jira/browse/HBASE-19204 > Project: HBase > Issue Type: Umbrella > Components: test >Reporter: stack > > Sean has been looking at tooling and infra. This Umbrellas is about looking > at actual tests. For example, running locally on dedicated machine I picked a > random test, TestPerColumnFamilyFlush. In my test run, it wrote 16M lines. It > seems to be having zk issues but it is catching interrupts and ignoring them > ([~carp84] fixed this in later versions over in HBASE-18441). > Let me try and do some fixup under this umbrella so we can get a 1.2.7 out > the door. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HBASE-19204) branch-1.2 times out and is taking 6-7 hours to complete
[ https://issues.apache.org/jira/browse/HBASE-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263407#comment-16263407 ] stack commented on HBASE-19204: --- Thanks [~xiaochen] for coming by w/ helpful input. I should bundle the surefirebooter*.jar up into our Docker container? > branch-1.2 times out and is taking 6-7 hours to complete > > > Key: HBASE-19204 > URL: https://issues.apache.org/jira/browse/HBASE-19204 > Project: HBase > Issue Type: Umbrella > Components: test >Reporter: stack > > Sean has been looking at tooling and infra. This Umbrellas is about looking > at actual tests. For example, running locally on dedicated machine I picked a > random test, TestPerColumnFamilyFlush. In my test run, it wrote 16M lines. It > seems to be having zk issues but it is catching interrupts and ignoring them > ([~carp84] fixed this in later versions over in HBASE-18441). > Let me try and do some fixup under this umbrella so we can get a 1.2.7 out > the door. -- This message was sent by Atlassian JIRA (v6.4.14#64029)