[jira] [Commented] (HBASE-14082) Add replica id to JMX metrics names
[ https://issues.apache.org/jira/browse/HBASE-14082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698572#comment-14698572 ] Hadoop QA commented on HBASE-14082: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750690/HBASE-14082-v5.patch against master branch at commit 737f264509284420e6fa8c14d92fe9fbdb49f67f. ATTACHMENT ID: 12750690 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 11 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat {color:red}-1 core zombie tests{color}. There are 2 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2.testMRIncrementalLoadWithLocality(TestHFileOutputFormat2.java:399) at org.apache.activemq.broker.RecoveryBrokerTest.testConsumedQueuePersistentMessagesLostOnRestart(RecoveryBrokerTest.java:193) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15120//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15120//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15120//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15120//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15120//console This message is automatically generated. Add replica id to JMX metrics names --- Key: HBASE-14082 URL: https://issues.apache.org/jira/browse/HBASE-14082 Project: HBase Issue Type: Improvement Components: metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-14082-v1.patch, HBASE-14082-v2.patch, HBASE-14082-v3.patch, HBASE-14082-v4.patch, HBASE-14082-v5.patch Today, via JMX, one cannot distinguish a primary region from a replica. A possible solution is to add replica id to JMX metrics names. The benefits may include, for example: # Knowing the latency of a read request on a replica region means the first attempt to the primary region has timeout. # Write requests on replicas are due to the replication process, while the ones on primary are from clients. # In case of looking for hot spots of read operations, replicas should be excluded since TIMELINE reads are sent to all replicas. To implement, we can change the format of metrics names found at {code}Hadoop-HBase-RegionServer-Regions-Attributes{code} from {code}namespace_namespace_table_tablename_region_regionname_metric_metricname{code} to {code}namespace_namespace_table_tablename_region_regionname_replicaid_replicaid_metric_metricname{code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14190) Assign system tables ahead of user region assignment
[ https://issues.apache.org/jira/browse/HBASE-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698587#comment-14698587 ] Ted Yu commented on HBASE-14190: The original intention of this JIRA didn't go as far as giving system tables their own WALs. HBASE-13556 can be kept open, in my opinion. Assign system tables ahead of user region assignment Key: HBASE-14190 URL: https://issues.apache.org/jira/browse/HBASE-14190 Project: HBase Issue Type: Bug Reporter: Ted Yu Assignee: Ted Yu Priority: Critical Attachments: 14190-v12.txt, 14190-v6.txt, 14190-v7.txt, 14190-v8.txt Currently the namespace table region is assigned like user regions. I spent several hours working with a customer where master couldn't finish initialization. Even though master was restarted quite a few times, it went down with the following: {code} 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] master.HMaster: Master server abort: loaded coprocessors are: [] 2015-08-05 17:16:57,530 FATAL [hdpmaster1:6.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.io.IOException: Timedout 30ms waiting for namespace table to be assigned at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:104) at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:985) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:779) at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:182) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1646) at java.lang.Thread.run(Thread.java:744) {code} During previous run(s), namespace table was created, hence leaving an entry in hbase:meta. The following if block in TableNamespaceManager#start() was skipped: {code} if (!MetaTableAccessor.tableExists(masterServices.getConnection(), TableName.NAMESPACE_TABLE_NAME)) { {code} TableNamespaceManager#start() spins, waiting for namespace region to be assigned. There was issue in master assigning user regions. We tried issuing 'assign' command from hbase shell which didn't work because of the following check in MasterRpcServices#assignRegion(): {code} master.checkInitialized(); {code} This scenario can be avoided if we assign hbase:namespace table after hbase:meta is assigned but before user table region assignment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698559#comment-14698559 ] Hadoop QA commented on HBASE-13127: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750688/13127.alternate.v3.txt against master branch at commit 737f264509284420e6fa8c14d92fe9fbdb49f67f. ATTACHMENT ID: 12750688 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.security.visibility.TestDefaultScanLabelGeneratorStack Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15119//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15119//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15119//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15119//console This message is automatically generated. Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12812) Update Netty dependency to latest release
[ https://issues.apache.org/jira/browse/HBASE-12812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698579#comment-14698579 ] Hudson commented on HBASE-12812: SUCCESS: Integrated in HBase-TRUNK #6731 (See [https://builds.apache.org/job/HBase-TRUNK/6731/]) HBASE-12812 Update Netty dependency to latest release (Jurriaan Mous) (stack: rev 737f264509284420e6fa8c14d92fe9fbdb49f67f) * pom.xml Update Netty dependency to latest release - Key: HBASE-12812 URL: https://issues.apache.org/jira/browse/HBASE-12812 Project: HBase Issue Type: Improvement Reporter: Jurriaan Mous Assignee: Jurriaan Mous Fix For: 2.0.0 Attachments: 12812v2.txt, HBASE-12812.patch Netty version was 4.0.23.Release of august 15th. Lets update to 4.0.25 which contains some performance improvements and bug fixes. http://netty.io/news/2014/10/29/4-0-24-Final.html http://netty.io/news/2014/12/31/4-0-25-Final.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13838) Fix shared TaskStatusTmpl.jamon issues (coloring, content, etc.)
[ https://issues.apache.org/jira/browse/HBASE-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698693#comment-14698693 ] Hadoop QA commented on HBASE-13838: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750704/hbase-13838_post_put_command.txt against master branch at commit 737f264509284420e6fa8c14d92fe9fbdb49f67f. ATTACHMENT ID: 12750704 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15121//console This message is automatically generated. Fix shared TaskStatusTmpl.jamon issues (coloring, content, etc.) Key: HBASE-13838 URL: https://issues.apache.org/jira/browse/HBASE-13838 Project: HBase Issue Type: Bug Components: UI Affects Versions: 1.1.0 Reporter: Lars George Assignee: Matt Warhaftig Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: hbase-13838-v1.patch, hbase-13838_post.tiff, hbase-13838_post_put_command.txt, hbase-13838_pre.tiff There are a few issues with the shared TaskStatusTmpl: - Client operations tab is always empty For Master this is expected, but for RegionServers there is never anything listed either. Fix for RS status page (probably caused by params not containing Operation subclass anymore, but some PB generated classes?) - Hide “Client Operations” tab for master UI Since operations are RS only. Or we fix this and make other calls show here. - The alert-error for aborted tasks is not set in CSS at all When a task was aborted it should be amber or red, but the assigned style is not in any of the linked stylesheets (abort-error). Add. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13838) Fix shared TaskStatusTmpl.jamon issues (coloring, content, etc.)
[ https://issues.apache.org/jira/browse/HBASE-13838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Warhaftig updated HBASE-13838: --- Attachment: hbase-13838_post_put_command.txt hbase-13838_post.tiff hbase-13838_pre.tiff Attached 'hbase-13838_pre.tiff' and 'hbase-13838_post.tiff' show that master UI page no longer has Operations tab and that alerts are now properly colored via Bootstrap. Attached 'hbase-13838_post_put_command.txt' shows that the Operations JSON lists actual data for a PUT (and any RPC operation). This displayed data was the open question about security concerns. Fix shared TaskStatusTmpl.jamon issues (coloring, content, etc.) Key: HBASE-13838 URL: https://issues.apache.org/jira/browse/HBASE-13838 Project: HBase Issue Type: Bug Components: UI Affects Versions: 1.1.0 Reporter: Lars George Assignee: Matt Warhaftig Labels: beginner Fix For: 2.0.0, 1.3.0 Attachments: hbase-13838-v1.patch, hbase-13838_post.tiff, hbase-13838_post_put_command.txt, hbase-13838_pre.tiff There are a few issues with the shared TaskStatusTmpl: - Client operations tab is always empty For Master this is expected, but for RegionServers there is never anything listed either. Fix for RS status page (probably caused by params not containing Operation subclass anymore, but some PB generated classes?) - Hide “Client Operations” tab for master UI Since operations are RS only. Or we fix this and make other calls show here. - The alert-error for aborted tasks is not set in CSS at all When a task was aborted it should be amber or red, but the assigned style is not in any of the linked stylesheets (abort-error). Add. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14078) improve error message when HMaster can't bind to port
[ https://issues.apache.org/jira/browse/HBASE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698737#comment-14698737 ] Hadoop QA commented on HBASE-14078: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750708/hbase-14078_post_stack.txt against master branch at commit 737f264509284420e6fa8c14d92fe9fbdb49f67f. ATTACHMENT ID: 12750708 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15122//console This message is automatically generated. improve error message when HMaster can't bind to port - Key: HBASE-14078 URL: https://issues.apache.org/jira/browse/HBASE-14078 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Assignee: Matt Warhaftig Labels: beginner Fix For: 2.0.0 Attachments: hbase-14078_post_stack.txt, hbase-14708-v1.patch, hbase-14708-v2.patch When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/master01.example.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:509) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:535) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} I recognize that the RSRpcServices log message shows port 16000, but I don't know why a new operator would. Additionally, it'd be nice to tell them that the port is controlled by {{hbase.master.port}}. Maybe give a hint on how to see what's using the port. Could be too os-dist specific? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13127: -- Attachment: 13127.alternate.v3.txt Cryptic 'shutting down' message with hung threads in failed TestDefaultScanLabelGeneratorStack. Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14224) Fix coprocessor handling of duplicate classes
[ https://issues.apache.org/jira/browse/HBASE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698757#comment-14698757 ] Lars George commented on HBASE-14224: - Yes, and I hope I have that describe completely in the attach PDF (or the linked note). If not, please add here. Fix coprocessor handling of duplicate classes - Key: HBASE-14224 URL: https://issues.apache.org/jira/browse/HBASE-14224 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 2.0.0, 1.0.1, 1.2.0, 1.1.1 Reporter: Lars George Priority: Critical Attachments: problem.pdf While discussing with [~misty] over on HBASE-13907 we noticed some inconsistency when copros are loaded. Sometimes you can load them more than once, sometimes you can not. Need to consolidate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14078) improve error message when HMaster can't bind to port
[ https://issues.apache.org/jira/browse/HBASE-14078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt Warhaftig updated HBASE-14078: --- Attachment: hbase-14078_post_stack.txt Thanks for the feedback [~stack]. Below are responses inline. {quote} Formatting is a little off. Please check. {quote} Can you point me towards the misformatted code? - I didn't see it when looking at the diff. {quote} Do you have example of what the emission looks like now? {quote} See attached 'hbase-14078_post_stack.txt'. {quote} Whatever the IOE exception that comes up out of setting up the rpc server, we will always have this suffix about how to config port. Will it always be a port issue? Perhaps work on BindException only? And only if 'Address in use'? {quote} You are correct with your questioning. I will tighten port issue error logic when making the earlier mentioned formatting change. {quote} The changes in HMaster.java do not seem to make for the same emission. Is that intentional? For example, before patch, if an Exception, we used to e.getCause().getMessage() if non-null but now if I read it right, we do e.toString {quote} Yes, the change was intentional because existing HMaster thrown errors include useful messages that were previously ignored when only e.getCause() was displayed. The e.getCause() message is still displayed after this change, just one level down the error stack now. improve error message when HMaster can't bind to port - Key: HBASE-14078 URL: https://issues.apache.org/jira/browse/HBASE-14078 Project: HBase Issue Type: Improvement Components: master Affects Versions: 2.0.0 Reporter: Sean Busbey Assignee: Matt Warhaftig Labels: beginner Fix For: 2.0.0 Attachments: hbase-14078_post_stack.txt, hbase-14708-v1.patch, hbase-14708-v2.patch When the master fails to start becahse hbase.master.port is already taken, the log messages could make it easier to tell. {quote} 2015-07-14 13:10:02,667 INFO [main] regionserver.RSRpcServices: master/master01.example.com/10.20.188.121:16000 server-side HConnection retries=350 2015-07-14 13:10:02,879 INFO [main] ipc.SimpleRpcScheduler: Using deadline as user call queue, count=3 2015-07-14 13:10:02,895 ERROR [main] master.HMasterCommandLine: Master exiting java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2258) at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:234) at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126) at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2272) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.apache.hadoop.hbase.ipc.RpcServer.bind(RpcServer.java:2513) at org.apache.hadoop.hbase.ipc.RpcServer$Listener.init(RpcServer.java:599) at org.apache.hadoop.hbase.ipc.RpcServer.init(RpcServer.java:2000) at org.apache.hadoop.hbase.regionserver.RSRpcServices.init(RSRpcServices.java:919) at org.apache.hadoop.hbase.master.MasterRpcServices.init(MasterRpcServices.java:211) at org.apache.hadoop.hbase.master.HMaster.createRpcServices(HMaster.java:509) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:535) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:351) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2253) ... 5 more {quote} I recognize that the RSRpcServices log message shows port 16000, but I don't know why a new operator would. Additionally, it'd be nice to tell them that the port is controlled by {{hbase.master.port}}. Maybe give a hint on how to see what's using the port. Could be too os-dist
[jira] [Commented] (HBASE-14224) Fix coprocessor handling of duplicate classes
[ https://issues.apache.org/jira/browse/HBASE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698755#comment-14698755 ] Lars George commented on HBASE-14224: - Whoops, here the online version if that helps: https://www.evernote.com/l/ACFO6OrjlNNHeZDPxhubGw8uDUSwAaOgxQU Fix coprocessor handling of duplicate classes - Key: HBASE-14224 URL: https://issues.apache.org/jira/browse/HBASE-14224 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 2.0.0, 1.0.1, 1.2.0, 1.1.1 Reporter: Lars George Priority: Critical Attachments: problem.pdf While discussing with [~misty] over on HBASE-13907 we noticed some inconsistency when copros are loaded. Sometimes you can load them more than once, sometimes you can not. Need to consolidate. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13127: -- Attachment: 13127.alternate.v4.txt Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v4.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698887#comment-14698887 ] stack commented on HBASE-13127: --- Says: kalashnikov:hbase.git stack$ python ./dev-support/findHangingTests.py https://builds.apache.org/job/PreCommit-HBASE-Build/15123/consoleText Fetching the console output from the URL Printing hanging tests Printing Failing tests Failing test : org.apache.hadoop.hbase.master.TestDistributedLogSplitting Results : Failed tests: TestDistributedLogSplitting.testLogReplayTwoSequentialRSDown:653 expected:1000 but was:896 So, this test looks like it can also report as a zombie. Tests run: 18, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 263.523 sec FAILURE! - in org.apache.hadoop.hbase.master.TestDistributedLogSplitting testLogReplayTwoSequentialRSDown(org.apache.hadoop.hbase.master.TestDistributedLogSplitting) Time elapsed: 40.392 sec FAILURE! java.lang.AssertionError: expected:1000 but was:896 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.hadoop.hbase.master.TestDistributedLogSplitting.testLogReplayTwoSequentialRSDown(TestDistributedLogSplitting.java:653) Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13127: -- Attachment: 13127.alternate.v3.txt Cleanup of TDLS. Make it so should not be a zombie anymore (we were not shutting down zk clusters in a few places). Doubt it will fix flakey test but hopefully no longer a zombie (This change unrelated but test fixing) Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698901#comment-14698901 ] Hadoop QA commented on HBASE-13127: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750729/13127.alternate.v4.txt against master branch at commit 737f264509284420e6fa8c14d92fe9fbdb49f67f. ATTACHMENT ID: 12750729 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + WALSplitter.getRegionDirRecoveredEditsDir(HRegion.getRegionDir(tdir, hri.getEncodedName())); +assertTrue(edits dir should have more than a single file in it. instead has + files.length, + WALSplitter.getRegionDirRecoveredEditsDir(HRegion.getRegionDir(tdir, hri.getEncodedName())); + WALSplitter.getRegionDirRecoveredEditsDir(HRegion.getRegionDir(tdir, hri.getEncodedName())); +new HLogKey(curRegionInfo.getEncodedNameAsBytes(), tableName, System.currentTimeMillis()), + NavigableSetPath recoveredEdits = WALSplitter.getSplitEditFilesSorted(fs, regionDirs.get(0)); {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.TestTimeout Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15124//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15124//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15124//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15124//console This message is automatically generated. Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v4.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13158) When client supports CellBlock, return the result Cells as controller payload for get(Get) API also
[ https://issues.apache.org/jira/browse/HBASE-13158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699002#comment-14699002 ] Anoop Sam John commented on HBASE-13158: I had some performance degrade after this change.. That was like 2 or 3 %.. The debug after that lead to some other perf optimizations as well.. I did not test for perf after all those.. Will do it ASAP Stack and report back. One diff is when the PB way of cell transfer happens, it encode Cell bytes into a stream of pre determined size. Where as we do resizable OS. May be that matters. Will report back more on this in 2 days time Stack. When client supports CellBlock, return the result Cells as controller payload for get(Get) API also --- Key: HBASE-13158 URL: https://issues.apache.org/jira/browse/HBASE-13158 Project: HBase Issue Type: Improvement Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0, 1.3.0 Attachments: 13158v4.suggestion.txt, HBASE-13158.patch, HBASE-13158_V2.patch, HBASE-13158_V3.patch, HBASE-13158_V4.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13127) Add timeouts on all tests so less zombie sightings
[ https://issues.apache.org/jira/browse/HBASE-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698837#comment-14698837 ] Hadoop QA commented on HBASE-13127: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750714/13127.alternate.v3.txt against master branch at commit 737f264509284420e6fa8c14d92fe9fbdb49f67f. ATTACHMENT ID: 12750714 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15123//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15123//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15123//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15123//console This message is automatically generated. Add timeouts on all tests so less zombie sightings -- Key: HBASE-13127 URL: https://issues.apache.org/jira/browse/HBASE-13127 Project: HBase Issue Type: Improvement Components: test Reporter: stack Assignee: stack Attachments: 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.txt, 13127.alternate.v2.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.alternate.v3.txt, 13127.txt, 13127v2.txt [~Apache9] and [~octo47] have been working hard at trying to get our builds passing again. They are almost there. TRUNK just failed with a zombie TestMasterObserver. Help the lads out by adding timeouts on all tests so less zombie incidence... will help identify the frequent failing issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14229) Flushing canceled by coprocessor still leads to memstoreSize set down
sunyerui created HBASE-14229: Summary: Flushing canceled by coprocessor still leads to memstoreSize set down Key: HBASE-14229 URL: https://issues.apache.org/jira/browse/HBASE-14229 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.1.1, 0.98.13, 1.0.2, 1.2.0 Reporter: sunyerui A Coprocessor override public InternalScanner preFlush(final Store store, final InternalScanner scanner) and return NULL when calling this method, will cancel flush request, leaving snapshot un-flushed, and no new storefile created. But the HRegion.internalFlushCache still set down memstoreSize to 0 by totalFlushableSize. If there's no write requests anymore, the memstoreSize will remaining as 0, and no more flush quests will be processed because of the checking of memstoreSize.get() =0 at the beginning of internalFlushCache. This issue may not cause data loss, but it will confuse coprocessor users. If we argree with this, I'll apply a patch later. -- This message was sent by Atlassian JIRA (v6.3.4#6332)