[jira] [Commented] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894281#comment-13894281 ] Hadoop QA commented on HBASE-10313: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627560/10313v2.txt against trunk revision . ATTACHMENT ID: 12627560 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8624//console This message is automatically generated. Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10463) Filter on columns containing numerics yield wrong results
[ https://issues.apache.org/jira/browse/HBASE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894285#comment-13894285 ] Deepa Vasanthkumar commented on HBASE-10463: Thanks [~ndimiduk]. For the timebeing, this was resolved by introducing a numerical comparator, and adding to HBase package in the dev envt, as using lower HBase version. Filter on columns containing numerics yield wrong results - Key: HBASE-10463 URL: https://issues.apache.org/jira/browse/HBASE-10463 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.8 Reporter: Deepa Vasanthkumar Original Estimate: 168h Remaining Estimate: 168h Used SingleColumnValueFilter with CompareFilter.CompareOp.GREATER_OR_EQUAL for filtering the scan result. However the columns which have numeric value, scan result is not correct, because of lexicographic comparison. Does HBase support numeric value filters (for equal, greater or equal..) for columns ? If not, can we add it? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] binlijin updated HBASE-3909: Attachment: HBASE-3909-backport-from-fb-for-trunk-4.patch Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Subbu M Iyer Attachments: 3909-102812.patch, 3909-102912.patch, 3909-v1.patch, 3909.v1, 3909_090712-2.patch, HBASE-3909-backport-from-fb-for-trunk-2.patch, HBASE-3909-backport-from-fb-for-trunk-3.patch, HBASE-3909-backport-from-fb-for-trunk-4.patch, HBASE-3909-backport-from-fb-for-trunk.patch, HBase Cluster Config Details.xlsx, patch-v2.patch, testMasterNoCluster.stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894291#comment-13894291 ] binlijin commented on HBASE-3909: - HBASE-3909-backport-from-fb-for-trunk-4.patch make short circuit reads dynamic. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Subbu M Iyer Attachments: 3909-102812.patch, 3909-102912.patch, 3909-v1.patch, 3909.v1, 3909_090712-2.patch, HBASE-3909-backport-from-fb-for-trunk-2.patch, HBASE-3909-backport-from-fb-for-trunk-3.patch, HBASE-3909-backport-from-fb-for-trunk-4.patch, HBASE-3909-backport-from-fb-for-trunk.patch, HBase Cluster Config Details.xlsx, patch-v2.patch, testMasterNoCluster.stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-9501) Provide throttling for replication
[ https://issues.apache.org/jira/browse/HBASE-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Feng Honghua updated HBASE-9501: Attachment: HBASE-9501-trunk_v3.patch new patch per [~jdcryans]'s review feedback, thanks [~jdcryans] Provide throttling for replication -- Key: HBASE-9501 URL: https://issues.apache.org/jira/browse/HBASE-9501 Project: HBase Issue Type: Improvement Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-9501-trunk_v0.patch, HBASE-9501-trunk_v1.patch, HBASE-9501-trunk_v2.patch, HBASE-9501-trunk_v3.patch When we disable a peer for a time of period, and then enable it, the ReplicationSource in master cluster will push the accumulated hlog entries during the disabled interval to the re-enabled peer cluster at full speed. If the bandwidth of the two clusters is shared by different applications, the push at full speed for replication can use all the bandwidth and severely influence other applications. Though there are two config replication.source.size.capacity and replication.source.nb.capacity to tweak the batch size each time a push delivers, but if decrease these two configs, the number of pushes increase, and all these pushes proceed continuously without pause. And no obvious help for the bandwidth throttling. From bandwidth-sharing and push-speed perspective, it's more reasonable to provide a bandwidth up limit for each peer push channel, and within that limit, peer can choose a big batch size for each push for bandwidth efficiency. Any opinion? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9501) Provide throttling for replication
[ https://issues.apache.org/jira/browse/HBASE-9501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894317#comment-13894317 ] Feng Honghua commented on HBASE-9501: - bq.How do you feel about using a class instead of a static method? IIUC you could push the two AtomicLongs in there, move the unit test out of TestReplicationSmallTests (so that you don't have to pay the price for setUp()) and have meaningful method names instead of lines like this one == done bq.The InterruptedException should be caught, if something told ReplicationSource to stop then we shouldn't ignore it and try to continue shipping edits == what about adding log and interrupting the current thread? same problem with ReplicationSource.sleepForRetries(). and seems there are various kinds of handling for sleep's InterruptedException in HBase, some ignores, some don't catch and delegate to upper callers... Provide throttling for replication -- Key: HBASE-9501 URL: https://issues.apache.org/jira/browse/HBASE-9501 Project: HBase Issue Type: Improvement Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-9501-trunk_v0.patch, HBASE-9501-trunk_v1.patch, HBASE-9501-trunk_v2.patch, HBASE-9501-trunk_v3.patch When we disable a peer for a time of period, and then enable it, the ReplicationSource in master cluster will push the accumulated hlog entries during the disabled interval to the re-enabled peer cluster at full speed. If the bandwidth of the two clusters is shared by different applications, the push at full speed for replication can use all the bandwidth and severely influence other applications. Though there are two config replication.source.size.capacity and replication.source.nb.capacity to tweak the batch size each time a push delivers, but if decrease these two configs, the number of pushes increase, and all these pushes proceed continuously without pause. And no obvious help for the bandwidth throttling. From bandwidth-sharing and push-speed perspective, it's more reasonable to provide a bandwidth up limit for each peer push channel, and within that limit, peer can choose a big batch size for each push for bandwidth efficiency. Any opinion? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8751) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster
[ https://issues.apache.org/jira/browse/HBASE-8751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894318#comment-13894318 ] Feng Honghua commented on HBASE-8751: - Ping [~andrew.purt...@gmail.com] :-) Enable peer cluster to choose/change the ColumnFamilies/Tables it really want to replicate from a source cluster Key: HBASE-8751 URL: https://issues.apache.org/jira/browse/HBASE-8751 Project: HBase Issue Type: New Feature Components: Replication Reporter: Feng Honghua Assignee: Feng Honghua Attachments: HBASE-8751-0.94-V0.patch, HBASE-8751-0.94-v1.patch, HBASE-8751-trunk_v0.patch, HBASE-8751-trunk_v1.patch, HBASE-8751-trunk_v2.patch, HBASE-8751-trunk_v3.patch Consider scenarios (all cf are with replication-scope=1): 1) cluster S has 3 tables, table A has cfA,cfB, table B has cfX,cfY, table C has cf1,cf2. 2) cluster X wants to replicate table A : cfA, table B : cfX and table C from cluster S. 3) cluster Y wants to replicate table B : cfY, table C : cf2 from cluster S. Current replication implementation can't achieve this since it'll push the data of all the replicatable column-families from cluster S to all its peers, X/Y in this scenario. This improvement provides a fine-grained replication theme which enable peer cluster to choose the column-families/tables they really want from the source cluster: A). Set the table:cf-list for a peer when addPeer: hbase-shell add_peer '3', zk:1100:/hbase, table1; table2:cf1,cf2; table3:cf2 B). View the table:cf-list config for a peer using show_peer_tableCFs: hbase-shell show_peer_tableCFs 1 C). Change/set the table:cf-list for a peer using set_peer_tableCFs: hbase-shell set_peer_tableCFs '2', table1:cfX; table2:cf1; table3:cf1,cf2 In this theme, replication-scope=1 only means a column-family CAN be replicated to other clusters, but only the 'table:cf-list list' determines WHICH cf/table will actually be replicated to a specific peer. To provide back-compatibility, empty 'table:cf-list list' will replicate all replicatable cf/table. (this means we don't allow a peer which replicates nothing from a source cluster, we think it's reasonable: if replicating nothing why bother adding a peer?) This improvement addresses the exact problem raised by the first FAQ in http://hbase.apache.org/replication.html: GLOBAL means replicate? Any provision to replicate only to cluster X and not to cluster Y? or is that for later? Yes, this is for much later. I also noticed somebody mentioned replication-scope as integer rather than a boolean is for such fine-grained replication purpose, but I think extending replication-scope can't achieve the same replication granularity flexibility as providing above per-peer replication configurations. This improvement has been running smoothly in our production clusters (Xiaomi) for several months. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10483) Provide API for retrieving info port when hbase.master.info.port is set to 0
[ https://issues.apache.org/jira/browse/HBASE-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894322#comment-13894322 ] Liu Shaohui commented on HBASE-10483: - This feature is needed in some scenarios, where we can't pre-define hbase.master.info.port, eg. hbase on yarn. Maybe we can write master info.port to master zk node when master get master lock. Then HBaseAdmin can get this master info port from master zk node. Provide API for retrieving info port when hbase.master.info.port is set to 0 Key: HBASE-10483 URL: https://issues.apache.org/jira/browse/HBASE-10483 Project: HBase Issue Type: Improvement Reporter: Ted Yu When hbase.master.info.port is set to 0, info port is dynamically determined. An API should be provided so that client can retrieve the actual info port. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Assigned] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi reassigned HBASE-10480: --- Assignee: Matteo Bertozzi TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10472) Manage the interruption in ZKUtil#getData
[ https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894331#comment-13894331 ] Nicolas Liochon commented on HBASE-10472: - bq. I guess we need this for checking the meta location from zk right? Yes. bq. Do we need that to be interruptible as well? If the API is synchronous, it needs to be interruptible... bq. I guess we might need it if we ever want to go meta replicas. Actually, we need it already, for example because a split task can be interrupted. Some HBase code relies on being interruptible already. With replicas, or without replica, you may want to rely on this to free the handlers (for example, if we get stuck on a region, we may use all the handlers of the region server today: if the client cancels the query interrupting the call does help). bq. We are not setting the interrupted flag for the thread in ReplicationPeersZKImpl, ZKLeaderManager If we throw an exception, it means that we took care of the interruption, so we don't need to reset the interrupt flag (and should not). If we abort the server, we can consider that we took care of the interruption. Keeping the thread interrupted could mean interrupting the abort, which may not be what was expected. bq. Can we get away with some of the changes, if we keep the original ZKUtil.getData() and have ZKUtil.getDataInterruptible() ? Not really, imho: we need the code to be clean when there are interruptions. ZooKeeper API is clean. Our wrapping is not clean today. Manage the interruption in ZKUtil#getData - Key: HBASE-10472 URL: https://issues.apache.org/jira/browse/HBASE-10472 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.98.0, 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.98.1, 0.99.0 Attachments: 10472.v1.patch, 10472.v2.patch Many, but not all, methods in ZKUtil partly hides the interruption: they return null or something like that. Many times, this will result in a NPE, or something undefined. This jira is limited to the getData to keep things small enough (it's used in hbase-client code). The code is supposed to behave at least 'as well as before', or better (hopefully). It could be included in a later .98 release (98.1) and in .99 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894334#comment-13894334 ] Hadoop QA commented on HBASE-3909: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627568/HBASE-3909-backport-from-fb-for-trunk-3.patch against trunk revision . ATTACHMENT ID: 12627568 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +private UpdateConfigurationRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private UpdateConfigurationResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code +private UpdateConfigurationRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private UpdateConfigurationResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8625//console This message is automatically generated. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Subbu M Iyer Attachments: 3909-102812.patch, 3909-102912.patch, 3909-v1.patch, 3909.v1, 3909_090712-2.patch, HBASE-3909-backport-from-fb-for-trunk-2.patch, HBASE-3909-backport-from-fb-for-trunk-3.patch, HBASE-3909-backport-from-fb-for-trunk-4.patch, HBASE-3909-backport-from-fb-for-trunk.patch, HBase Cluster Config Details.xlsx, patch-v2.patch, testMasterNoCluster.stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic
[jira] [Commented] (HBASE-10334) RegionServer links in table.jsp is broken
[ https://issues.apache.org/jira/browse/HBASE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894337#comment-13894337 ] Hudson commented on HBASE-10334: FAILURE: Integrated in hbase-0.96 #283 (See [https://builds.apache.org/job/hbase-0.96/283/]) HBASE-10334 RegionServer links in table.jsp is broken (stack: rev 1565547) * /hbase/branches/0.96/hbase-server/src/main/resources/hbase-webapps/master/table.jsp RegionServer links in table.jsp is broken - Key: HBASE-10334 URL: https://issues.apache.org/jira/browse/HBASE-10334 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10334_v1.patch The links to RS's seems to be broken in table.jsp after HBASE-9892. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894338#comment-13894338 ] Hudson commented on HBASE-9892: --- FAILURE: Integrated in hbase-0.96 #283 (See [https://builds.apache.org/job/hbase-0.96/283/]) HBASE-10340 [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node (stack: rev 1565546) * /hbase/branches/0.96/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/branches/0.96/hbase-protocol/src/main/protobuf/HBase.proto * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java * /hbase/branches/0.96/hbase-server/src/main/resources/hbase-webapps/master/table.jsp Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, HBASE-9892-0.94-v6.diff, HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-trunk-v3.diff, HBASE-9892-v5.txt The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10340) [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894339#comment-13894339 ] Hudson commented on HBASE-10340: FAILURE: Integrated in hbase-0.96 #283 (See [https://builds.apache.org/job/hbase-0.96/283/]) HBASE-10340 [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node (stack: rev 1565546) * /hbase/branches/0.96/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/branches/0.96/hbase-protocol/src/main/protobuf/HBase.proto * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java * /hbase/branches/0.96/hbase-server/src/main/resources/hbase-webapps/master/table.jsp [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node -- Key: HBASE-10340 URL: https://issues.apache.org/jira/browse/HBASE-10340 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.96.2, 0.94.17 Backport this patch after testing it does not break anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10296) Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894362#comment-13894362 ] Feng Honghua commented on HBASE-10296: -- bq.The only thing is that, this would require multi master setup unless we keep the logs in hdfs and be able to use a single node RAFT quorum right? Do you mean master back-compatibility issue? No master back-compatibility when replacing zk with embedded consensus lib, nor between different incremental phases. My point when proposing increment approach is that the data/functionalities provided by zk have different urgency level to be moved to inside master processes, data such as region assignment info has top urgency level to be moved inside master processes, configuration-like data has less urgent level(which needs additional watch/notify feature), liveness monitoring functionality has the least urgent level(which needs additional heart-beat feature)...we can remarkably improve master failover performance and eliminate inconsistency by replicating region assignment info among master processes' memory, the foremost concern/goal of this jira. To eliminate ZK as a whole to reduce deploying and machine number/roles, as [~stack] said ...simplifying hbase packaging and deploy. There would be no more need to set aside machines for this special role, we also need to implement liveness monitoring(for regionservers) and watch/notify features within master processes... Above features are independent and can be implemented in an incremental fashion, that's what I meant by 'incremental', but certainly we can implemented them as a whole. Not sure I understand your question correctly, and wonder whether I answer your question, any further clarification is welcome :-) Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency -- Key: HBASE-10296 URL: https://issues.apache.org/jira/browse/HBASE-10296 Project: HBase Issue Type: Brainstorming Components: master, Region Assignment, regionserver Reporter: Feng Honghua Currently master relies on ZK to elect active master, monitor liveness and store almost all of its states, such as region states, table info, replication info and so on. And zk also plays as a channel for master-regionserver communication(such as in region assigning) and client-regionserver communication(such as replication state/behavior change). But zk as a communication channel is fragile due to its one-time watch and asynchronous notification mechanism which together can leads to missed events(hence missed messages), for example the master must rely on the state transition logic's idempotence to maintain the region assigning state machine's correctness, actually almost all of the most tricky inconsistency issues can trace back their root cause to the fragility of zk as a communication channel. Replace zk with paxos running within master processes have following benefits: 1. better master failover performance: all master, either the active or the standby ones, have the same latest states in memory(except lag ones but which can eventually catch up later on). whenever the active master dies, the newly elected active master can immediately play its role without such failover work as building its in-memory states by consulting meta-table and zk. 2. better state consistency: master's in-memory states are the only truth about the system,which can eliminate inconsistency from the very beginning. and though the states are contained by all masters, paxos guarantees they are identical at any time. 3. more direct and simple communication pattern: client changes state by sending requests to master, master and regionserver talk directly to each other by sending request and response...all don't bother to using a third-party storage like zk which can introduce more uncertainty, worse latency and more complexity. 4. zk can only be used as liveness monitoring for determining if a regionserver is dead, and later on we can eliminate zk totally when we build heartbeat between master and regionserver. I know this might looks like a very crazy re-architect, but it deserves deep thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894369#comment-13894369 ] Hadoop QA commented on HBASE-3909: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627573/HBASE-3909-backport-from-fb-for-trunk-4.patch against trunk revision . ATTACHMENT ID: 12627573 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +private UpdateConfigurationRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private UpdateConfigurationResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code +private UpdateConfigurationRequest(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } +private UpdateConfigurationResponse(boolean noInit) { this.unknownFields = com.google.protobuf.UnknownFieldSet.getDefaultInstance(); } + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code + * coderpc UpdateConfiguration(.UpdateConfigurationRequest) returns (.UpdateConfigurationResponse);/code {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8626//console This message is automatically generated. Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Subbu M Iyer Attachments: 3909-102812.patch, 3909-102912.patch, 3909-v1.patch, 3909.v1, 3909_090712-2.patch, HBASE-3909-backport-from-fb-for-trunk-2.patch, HBASE-3909-backport-from-fb-for-trunk-3.patch, HBASE-3909-backport-from-fb-for-trunk-4.patch, HBASE-3909-backport-from-fb-for-trunk.patch, HBase Cluster Config Details.xlsx, patch-v2.patch, testMasterNoCluster.stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic
[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown
[ https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10185: Status: Open (was: Patch Available) HBaseClient retries even though a DoNotRetryException was thrown Key: HBASE-10185 URL: https://issues.apache.org/jira/browse/HBASE-10185 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.94.12, 0.99.0 Reporter: Samarth Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10185.v1.patch, 10185.v2.patch Throwing a DoNotRetryIOException inside Writable.write(Dataoutput) method doesn't prevent HBase from retrying. Debugging the code locally, I figured that the bug lies in the way HBaseClient simply throws an IOException when it sees that a connection has been closed unexpectedly. Method: public Writable call(Writable param, InetSocketAddress addr, Class? extends VersionedProtocol protocol, User ticket, int rpcTimeout) Excerpt of code where the bug is present: while (!call.done) { if (connection.shouldCloseConnection.get()) { throw new IOException(Unexpected closed connection); } Throwing this IOException causes the ServerCallable.translateException(t) to be a no-op resulting in HBase retrying. From my limited view and understanding of the code, one way I could think of handling this is by looking at the closeConnection member variable of a connection to determine what kind of exception should be thrown. Specifically, when a connection is closed, the current code does this: protected synchronized void markClosed(IOException e) { if (shouldCloseConnection.compareAndSet(false, true)) { closeException = e; notifyAll(); } } Within HBaseClient's call method, the code could possibly be modified to: while (!call.done) { if (connection.shouldCloseConnection.get() ) { if(connection.closeException instanceof DoNotRetryIOException) { throw closeException; } throw new IOException(Unexpected closed connection); } -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown
[ https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10185: Status: Patch Available (was: Open) HBaseClient retries even though a DoNotRetryException was thrown Key: HBASE-10185 URL: https://issues.apache.org/jira/browse/HBASE-10185 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.94.12, 0.99.0 Reporter: Samarth Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10185.v1.patch, 10185.v2.patch Throwing a DoNotRetryIOException inside Writable.write(Dataoutput) method doesn't prevent HBase from retrying. Debugging the code locally, I figured that the bug lies in the way HBaseClient simply throws an IOException when it sees that a connection has been closed unexpectedly. Method: public Writable call(Writable param, InetSocketAddress addr, Class? extends VersionedProtocol protocol, User ticket, int rpcTimeout) Excerpt of code where the bug is present: while (!call.done) { if (connection.shouldCloseConnection.get()) { throw new IOException(Unexpected closed connection); } Throwing this IOException causes the ServerCallable.translateException(t) to be a no-op resulting in HBase retrying. From my limited view and understanding of the code, one way I could think of handling this is by looking at the closeConnection member variable of a connection to determine what kind of exception should be thrown. Specifically, when a connection is closed, the current code does this: protected synchronized void markClosed(IOException e) { if (shouldCloseConnection.compareAndSet(false, true)) { closeException = e; notifyAll(); } } Within HBaseClient's call method, the code could possibly be modified to: while (!call.done) { if (connection.shouldCloseConnection.get() ) { if(connection.closeException instanceof DoNotRetryIOException) { throw closeException; } throw new IOException(Unexpected closed connection); } -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown
[ https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10185: Attachment: 10185.v2.patch HBaseClient retries even though a DoNotRetryException was thrown Key: HBASE-10185 URL: https://issues.apache.org/jira/browse/HBASE-10185 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.94.12, 0.99.0 Reporter: Samarth Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10185.v1.patch, 10185.v2.patch Throwing a DoNotRetryIOException inside Writable.write(Dataoutput) method doesn't prevent HBase from retrying. Debugging the code locally, I figured that the bug lies in the way HBaseClient simply throws an IOException when it sees that a connection has been closed unexpectedly. Method: public Writable call(Writable param, InetSocketAddress addr, Class? extends VersionedProtocol protocol, User ticket, int rpcTimeout) Excerpt of code where the bug is present: while (!call.done) { if (connection.shouldCloseConnection.get()) { throw new IOException(Unexpected closed connection); } Throwing this IOException causes the ServerCallable.translateException(t) to be a no-op resulting in HBase retrying. From my limited view and understanding of the code, one way I could think of handling this is by looking at the closeConnection member variable of a connection to determine what kind of exception should be thrown. Specifically, when a connection is closed, the current code does this: protected synchronized void markClosed(IOException e) { if (shouldCloseConnection.compareAndSet(false, true)) { closeException = e; notifyAll(); } } Within HBaseClient's call method, the code could possibly be modified to: while (!call.done) { if (connection.shouldCloseConnection.get() ) { if(connection.closeException instanceof DoNotRetryIOException) { throw closeException; } throw new IOException(Unexpected closed connection); } -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown
[ https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894391#comment-13894391 ] Nicolas Liochon commented on HBASE-10185: - v2 cleans up the way we manage the 'calls' list. synchronization should be as today everywhere. HBaseClient retries even though a DoNotRetryException was thrown Key: HBASE-10185 URL: https://issues.apache.org/jira/browse/HBASE-10185 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.94.12, 0.99.0 Reporter: Samarth Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10185.v1.patch, 10185.v2.patch Throwing a DoNotRetryIOException inside Writable.write(Dataoutput) method doesn't prevent HBase from retrying. Debugging the code locally, I figured that the bug lies in the way HBaseClient simply throws an IOException when it sees that a connection has been closed unexpectedly. Method: public Writable call(Writable param, InetSocketAddress addr, Class? extends VersionedProtocol protocol, User ticket, int rpcTimeout) Excerpt of code where the bug is present: while (!call.done) { if (connection.shouldCloseConnection.get()) { throw new IOException(Unexpected closed connection); } Throwing this IOException causes the ServerCallable.translateException(t) to be a no-op resulting in HBase retrying. From my limited view and understanding of the code, one way I could think of handling this is by looking at the closeConnection member variable of a connection to determine what kind of exception should be thrown. Specifically, when a connection is closed, the current code does this: protected synchronized void markClosed(IOException e) { if (shouldCloseConnection.compareAndSet(false, true)) { closeException = e; notifyAll(); } } Within HBaseClient's call method, the code could possibly be modified to: while (!call.done) { if (connection.shouldCloseConnection.get() ) { if(connection.closeException instanceof DoNotRetryIOException) { throw closeException; } throw new IOException(Unexpected closed connection); } -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10480: Component/s: test TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10480: Attachment: HBASE-10480-v0.patch v0 adds some extra sleep in case we don't have yet the expected number of logs TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10480: Status: Patch Available (was: Open) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10480: Attachment: HBASE-10480-v0.patch TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch, HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10480: Attachment: (was: HBASE-10480-v0.patch) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown
[ https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894456#comment-13894456 ] Hadoop QA commented on HBASE-10185: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627588/10185.v2.patch against trunk revision . ATTACHMENT ID: 12627588 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8627//console This message is automatically generated. HBaseClient retries even though a DoNotRetryException was thrown Key: HBASE-10185 URL: https://issues.apache.org/jira/browse/HBASE-10185 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.94.12, 0.99.0 Reporter: Samarth Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10185.v1.patch, 10185.v2.patch Throwing a DoNotRetryIOException inside Writable.write(Dataoutput) method doesn't prevent HBase from retrying. Debugging the code locally, I figured that the bug lies in the way HBaseClient simply throws an IOException when it sees that a connection has been closed unexpectedly. Method: public Writable call(Writable param, InetSocketAddress addr, Class? extends VersionedProtocol protocol, User ticket, int rpcTimeout) Excerpt of code where the bug is present: while (!call.done) { if (connection.shouldCloseConnection.get()) { throw new IOException(Unexpected closed connection); } Throwing this IOException causes the ServerCallable.translateException(t) to be a no-op resulting in HBase retrying. From my limited view and understanding of the code, one way I could think of handling this is by looking at the closeConnection member variable of a connection to determine what kind of exception should be thrown. Specifically, when a connection is closed, the current code does this: protected synchronized void
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894460#comment-13894460 ] Hadoop QA commented on HBASE-10480: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627590/HBASE-10480-v0.patch against trunk revision . ATTACHMENT ID: 12627590 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8628//console This message is automatically generated. TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10334) RegionServer links in table.jsp is broken
[ https://issues.apache.org/jira/browse/HBASE-10334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894462#comment-13894462 ] Hudson commented on HBASE-10334: SUCCESS: Integrated in hbase-0.96-hadoop2 #196 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/196/]) HBASE-10334 RegionServer links in table.jsp is broken (stack: rev 1565547) * /hbase/branches/0.96/hbase-server/src/main/resources/hbase-webapps/master/table.jsp RegionServer links in table.jsp is broken - Key: HBASE-10334 URL: https://issues.apache.org/jira/browse/HBASE-10334 Project: HBase Issue Type: Bug Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-10334_v1.patch The links to RS's seems to be broken in table.jsp after HBASE-9892. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10340) [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894464#comment-13894464 ] Hudson commented on HBASE-10340: SUCCESS: Integrated in hbase-0.96-hadoop2 #196 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/196/]) HBASE-10340 [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node (stack: rev 1565546) * /hbase/branches/0.96/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/branches/0.96/hbase-protocol/src/main/protobuf/HBase.proto * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java * /hbase/branches/0.96/hbase-server/src/main/resources/hbase-webapps/master/table.jsp [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node -- Key: HBASE-10340 URL: https://issues.apache.org/jira/browse/HBASE-10340 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.96.2, 0.94.17 Backport this patch after testing it does not break anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894463#comment-13894463 ] Hudson commented on HBASE-9892: --- SUCCESS: Integrated in hbase-0.96-hadoop2 #196 (See [https://builds.apache.org/job/hbase-0.96-hadoop2/196/]) HBASE-10340 [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node (stack: rev 1565546) * /hbase/branches/0.96/hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/HBaseProtos.java * /hbase/branches/0.96/hbase-protocol/src/main/protobuf/HBase.proto * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/MasterStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/master/RegionServerListTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/jamon/org/apache/hadoop/hbase/tmpl/regionserver/RSStatusTmpl.jamon * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/LocalHBaseCluster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSDumpServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RSStatusServlet.java * /hbase/branches/0.96/hbase-server/src/main/java/org/apache/hadoop/hbase/zookeeper/RegionServerTracker.java * /hbase/branches/0.96/hbase-server/src/main/resources/hbase-webapps/master/table.jsp Add info port to ServerName to support multi instances in a node Key: HBASE-9892 URL: https://issues.apache.org/jira/browse/HBASE-9892 Project: HBase Issue Type: Improvement Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Critical Fix For: 0.98.0, 0.99.0 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff, HBASE-9892-0.94-v5.diff, HBASE-9892-0.94-v6.diff, HBASE-9892-trunk-v1.diff, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v1.patch, HBASE-9892-trunk-v2.patch, HBASE-9892-trunk-v3.diff, HBASE-9892-v5.txt The full GC time of regionserver with big heap( 30G ) usually can not be controlled in 30s. At the same time, the servers with 64G memory are normal. So we try to deploy multi rs instances(2-3 ) in a single node and the heap of each rs is about 20G ~ 24G. Most of the things works fine, except the hbase web ui. The master get the RS info port from conf, which is suitable for this situation of multi rs instances in a node. So we add info port to ServerName. a. at the startup, rs report it's info port to Hmaster. b, For root region, rs write the servername with info port ro the zookeeper root-region-server node. c, For meta regions, rs write the servername with info port to root region d. For user regions, rs write the servername with info port to meta regions So hmaster and client can get info port from the servername. To test this feature, I change the rs num from 1 to 3 in standalone mode, so we can test it in standalone mode, I think Hoya(hbase on yarn) will encounter the same problem. Anyone knows how Hoya handle this problem? PS: There are different formats for servername in zk node and meta table, i think we need to unify it and refactor the code. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894557#comment-13894557 ] Hadoop QA commented on HBASE-10480: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627595/HBASE-10480-v0.patch against trunk revision . ATTACHMENT ID: 12627595 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8629//console This message is automatically generated. TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894558#comment-13894558 ] Ted Yu commented on HBASE-10480: Can the sleep interval be made shorter inside the retry loop ? TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-2.patch latest patch with unit test category added Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Release Note: TableSplit.getLength() contains correct sizes of region in bytes. It is used by M/R framework for better scheduling. Status: Patch Available (was: In Progress) Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10313: -- Attachment: 10313v2.098.txt What I applied to 0.98 Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.098.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894733#comment-13894733 ] stack commented on HBASE-10480: --- +1 TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10483) Provide API for retrieving info port when hbase.master.info.port is set to 0
[ https://issues.apache.org/jira/browse/HBASE-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894737#comment-13894737 ] stack commented on HBASE-10483: --- Doing it the way [~liushaohui] suggests would have master info port align w/ how info port is done for regionserver HBASE-9892 -- makes sense. Provide API for retrieving info port when hbase.master.info.port is set to 0 Key: HBASE-10483 URL: https://issues.apache.org/jira/browse/HBASE-10483 Project: HBase Issue Type: Improvement Reporter: Ted Yu When hbase.master.info.port is set to 0, info port is dynamically determined. An API should be provided so that client can retrieve the actual info port. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10313: -- Resolution: Fixed Status: Resolved (was: Patch Available) Committed to 0.96, 0.98, and trunk. Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.098.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894758#comment-13894758 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627643/HBASE-10413-2.patch against trunk revision . ATTACHMENT ID: 12627643 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + LOG.debug(MessageFormat.format(Region {0} has size {1}, regionLoad.getNameAsString(), regionSizeBytes)); +assertEquals((2 * ((long) Integer.MAX_VALUE)) * megabyte, calculator.getRegionSize(largeRegion.getBytes())); {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8630//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894760#comment-13894760 ] Ted Yu commented on HBASE-3909: --- Patch is of large size. Mind posting it on review board ? Thanks Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Subbu M Iyer Attachments: 3909-102812.patch, 3909-102912.patch, 3909-v1.patch, 3909.v1, 3909_090712-2.patch, HBASE-3909-backport-from-fb-for-trunk-2.patch, HBASE-3909-backport-from-fb-for-trunk-3.patch, HBASE-3909-backport-from-fb-for-trunk-4.patch, HBASE-3909-backport-from-fb-for-trunk.patch, HBase Cluster Config Details.xlsx, patch-v2.patch, testMasterNoCluster.stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10463) Filter on columns containing numerics yield wrong results
[ https://issues.apache.org/jira/browse/HBASE-10463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894767#comment-13894767 ] Nick Dimiduk commented on HBASE-10463: -- Sounds good [~deepa.vasanthkumar]. How would you like to proceed? Do you have a new comparator to contribute or perhaps a documentation update? Shall I close this issue as No a Problem? Filter on columns containing numerics yield wrong results - Key: HBASE-10463 URL: https://issues.apache.org/jira/browse/HBASE-10463 Project: HBase Issue Type: Improvement Components: Filters Affects Versions: 0.94.8 Reporter: Deepa Vasanthkumar Original Estimate: 168h Remaining Estimate: 168h Used SingleColumnValueFilter with CompareFilter.CompareOp.GREATER_OR_EQUAL for filtering the scan result. However the columns which have numeric value, scan result is not correct, because of lexicographic comparison. Does HBase support numeric value filters (for equal, greater or equal..) for columns ? If not, can we add it? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-5356) region_mover.rb can hang if table region it belongs to is deleted.
[ https://issues.apache.org/jira/browse/HBASE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-5356: --- Resolution: Fixed Fix Version/s: 0.99.0 0.96.2 0.98.0 Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) region_mover.rb can hang if table region it belongs to is deleted. -- Key: HBASE-5356 URL: https://issues.apache.org/jira/browse/HBASE-5356 Project: HBase Issue Type: Bug Affects Versions: 0.90.3, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-5356.patch I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation). Here's the start of the relevent stack dump {code} 12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\ Verify_1328735001040,yC^P\xD7\x945\xD4,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\ 9) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\ ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\ java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171) at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326) at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535) at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133) at org.jruby.runtime.BlockBody.call(BlockBody.java:73) at org.jruby.runtime.Block.call(Block.java:89) at org.jruby.RubyProc.call(RubyProc.java:268) at org.jruby.RubyProc.call(RubyProc.java:228) at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at
[jira] [Commented] (HBASE-5356) region_mover.rb can hang if table region it belongs to is deleted.
[ https://issues.apache.org/jira/browse/HBASE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894780#comment-13894780 ] Jimmy Xiang commented on HBASE-5356: Integrated into trunk, 0.98, and 0.96. Thanks. region_mover.rb can hang if table region it belongs to is deleted. -- Key: HBASE-5356 URL: https://issues.apache.org/jira/browse/HBASE-5356 Project: HBase Issue Type: Bug Affects Versions: 0.90.3, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-5356.patch I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation). Here's the start of the relevent stack dump {code} 12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\ Verify_1328735001040,yC^P\xD7\x945\xD4,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\ 9) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\ ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\ java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171) at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326) at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535) at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133) at org.jruby.runtime.BlockBody.call(BlockBody.java:73) at org.jruby.runtime.Block.call(Block.java:89) at org.jruby.RubyProc.call(RubyProc.java:268) at org.jruby.RubyProc.call(RubyProc.java:228) at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at org.jruby.ast.CallOneArgNode.interpret(CallOneArgNode.java:57) at
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894790#comment-13894790 ] Lukas Nalezenec commented on HBASE-10413: - Ad: + long regionSizeBytes = (memSize + fileSize) * megaByte; Does memstore size have to be included ? I am not sure. What are cons and pros ? Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10340) [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894796#comment-13894796 ] Lars Hofhansl commented on HBASE-10340: --- Thanks [~stack]! [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node -- Key: HBASE-10340 URL: https://issues.apache.org/jira/browse/HBASE-10340 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.96.2, 0.94.17 Backport this patch after testing it does not break anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894799#comment-13894799 ] Ted Yu commented on HBASE-10413: The patch enhances TableInputFormat which deals with store files, right ? We don't know when memstore would be flushed. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-3909) Add dynamic config
[ https://issues.apache.org/jira/browse/HBASE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894815#comment-13894815 ] Ted Yu commented on HBASE-3909: --- {code} + * Copyright 2013 The Apache Software Foundation {code} Year is not needed. {code} +public interface ConfigurationObserver { {code} Add annotation for audience. {code} + LOG.error(Encountered a NPE while notifying observers.); {code} Please add some information about the observer which caused the NPE Add dynamic config -- Key: HBASE-3909 URL: https://issues.apache.org/jira/browse/HBASE-3909 Project: HBase Issue Type: New Feature Reporter: stack Assignee: Subbu M Iyer Attachments: 3909-102812.patch, 3909-102912.patch, 3909-v1.patch, 3909.v1, 3909_090712-2.patch, HBASE-3909-backport-from-fb-for-trunk-2.patch, HBASE-3909-backport-from-fb-for-trunk-3.patch, HBASE-3909-backport-from-fb-for-trunk-4.patch, HBASE-3909-backport-from-fb-for-trunk.patch, HBase Cluster Config Details.xlsx, patch-v2.patch, testMasterNoCluster.stack I'm sure this issue exists already, at least as part of the discussion around making online schema edits possible, but no hard this having its own issue. Ted started a conversation on this topic up on dev and Todd suggested we lookd at how Hadoop did it over in HADOOP-7001 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10472) Manage the interruption in ZKUtil#getData
[ https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894834#comment-13894834 ] Enis Soztutar commented on HBASE-10472: --- bq. Not really, imho: we need the code to be clean when there are interruptions. ZooKeeper API is clean. Our wrapping is not clean today. Ok. I guess we can make ZkUtil methods interruptible one by one in the long term. +1. Manage the interruption in ZKUtil#getData - Key: HBASE-10472 URL: https://issues.apache.org/jira/browse/HBASE-10472 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.98.0, 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.98.1, 0.99.0 Attachments: 10472.v1.patch, 10472.v2.patch Many, but not all, methods in ZKUtil partly hides the interruption: they return null or something like that. Many times, this will result in a NPE, or something undefined. This jira is limited to the getData to keep things small enough (it's used in hbase-client code). The code is supposed to behave at least 'as well as before', or better (hopefully). It could be included in a later .98 release (98.1) and in .99 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10363) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles.
[ https://issues.apache.org/jira/browse/HBASE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894836#comment-13894836 ] Lars Hofhansl commented on HBASE-10363: --- Now of course I see TestImportExport, TestTableMapReduce, and TestImportTsv with Hadoop 2 and 0.94 [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles. --- Key: HBASE-10363 URL: https://issues.apache.org/jira/browse/HBASE-10363 Project: HBase Issue Type: Bug Affects Versions: 0.94.15 Reporter: Jonathan Hsieh Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.17 Attachments: 10363.txt From tip of 0.94 and from 0.94.15. {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=2.0 -Dtest=TestInputSampler,TestInputSamplerTool -PlocalTests ... Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool Tests run: 4, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.718 sec FAILURE! Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.666 sec FAILURE! Results : Tests in error: testSplitInterval(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitRamdom(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSample(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor testIntervalSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor Tests run: 6, Failures: 0, Errors: 5, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894845#comment-13894845 ] Lukas Nalezenec commented on HBASE-10413: - ok, memstore size removed. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10363) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles.
[ https://issues.apache.org/jira/browse/HBASE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894842#comment-13894842 ] Jonathan Hsieh commented on HBASE-10363: I believe on the internal cloudera test rigs we've marked them off as flaky. [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles. --- Key: HBASE-10363 URL: https://issues.apache.org/jira/browse/HBASE-10363 Project: HBase Issue Type: Bug Affects Versions: 0.94.15 Reporter: Jonathan Hsieh Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.17 Attachments: 10363.txt From tip of 0.94 and from 0.94.15. {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=2.0 -Dtest=TestInputSampler,TestInputSamplerTool -PlocalTests ... Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool Tests run: 4, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.718 sec FAILURE! Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.666 sec FAILURE! Results : Tests in error: testSplitInterval(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitRamdom(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSample(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor testIntervalSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor Tests run: 6, Failures: 0, Errors: 5, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-4.patch Calculator works only with store file size, not memstore size Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894866#comment-13894866 ] Sergey Shelukhin commented on HBASE-10479: -- stack: HConnection is actually Cluster, user-facing, but one cannot simply rename it. I guess I can call a new one ClusterConnectionInternal? I want to keep internal in the name so it would be obvious. bq. boolean isTableAvailable(TableName tableName, byte[][] splitKeys) throws This seems valid user method to me. bq. HRegionLocation getRegionLocation(TableName tableName, byte [] row, Already deprecated... note that I am deprecating methods, not removing them. bq. I am not sure about setRegionCachePrefetch() and friends. Do people disable region cache prefetch? Seems like explicitly user-facing method to me. Whether it's needed is another question. bq. While we are at it, lets change the public fields (public static fields) in HC and HCM as well. There are constants. bq. Can we also move this to HConnectionInternal : HCM.injectNonceGeneratorForTesting() HCI is an interface; I don't think we want to add that method there and have it propagate to all implementations. bq. HCM.execute() is marked private with annotation. But we still have the same problem in HCM that it contains both public intended methods (createConnection()) and private methods. Can we reduce the visibility or make a HCMInternal or smt like that? Same for HCM.findException() and HCM.setServerSideHConnectionRetries() HCM was kind of out of the scope of this patch. We could make a pass-thru from HCM to HCMInternal, because it's static it's not going to be pretty. All the logic will move to ...Internal, and HCM will just call the internal. You want to do that? HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.02.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10479: - Attachment: HBASE-10479.02.patch HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.02.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10363) [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles.
[ https://issues.apache.org/jira/browse/HBASE-10363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894863#comment-13894863 ] Lars Hofhansl commented on HBASE-10363: --- TestImportExport fails locally too (with Hadoop 2). I am having a hard time making them work with both Hadoop 1 and Hadoop 2. (see also HBASE-6330, when I use the config from UTIL Hadoop 2 passes, but Hadoop 1 fails, when I use the config from the cluster Hadoop 1 and Hadoop 2 fails. [0.94] TestInputSampler and TestInputSamplerTool fail under hadoop 2.0/23 profiles. --- Key: HBASE-10363 URL: https://issues.apache.org/jira/browse/HBASE-10363 Project: HBase Issue Type: Bug Affects Versions: 0.94.15 Reporter: Jonathan Hsieh Assignee: Lars Hofhansl Priority: Critical Fix For: 0.94.17 Attachments: 10363.txt From tip of 0.94 and from 0.94.15. {code} jon@swoop:~/proj/hbase-0.94$ mvn clean test -Dhadoop.profile=2.0 -Dtest=TestInputSampler,TestInputSamplerTool -PlocalTests ... Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool Tests run: 4, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 3.718 sec FAILURE! Running org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.666 sec FAILURE! Results : Tests in error: testSplitInterval(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitRamdom(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSample(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSamplerTool): Failed getting constructor testSplitSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor testIntervalSampler(org.apache.hadoop.hbase.mapreduce.hadoopbackport.TestInputSampler): Failed getting constructor Tests run: 6, Failures: 0, Errors: 5, Skipped: 0 {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10479: - Attachment: HBASE-10479.01.patch HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10479: - Attachment: (was: HBASE-10479.01.patch) HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894898#comment-13894898 ] Hudson commented on HBASE-10313: FAILURE: Integrated in HBase-TRUNK #4897 (See [https://builds.apache.org/job/HBase-TRUNK/4897/]) HBASE-10313 Duplicate servlet-api jars in hbase 0.96.0 (stack: rev 1565730) * /hbase/trunk/hbase-server/pom.xml * /hbase/trunk/pom.xml Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.098.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894904#comment-13894904 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627660/HBASE-10413-3.patch against trunk revision . ATTACHMENT ID: 12627660 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.util.TestHBaseFsck Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8631//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894909#comment-13894909 ] Hudson commented on HBASE-10313: SUCCESS: Integrated in HBase-0.98-on-Hadoop-1.1 #128 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/128/]) HBASE-10313 Duplicate servlet-api jars in hbase 0.96.0 (stack: rev 1565731) * /hbase/branches/0.98/hbase-server/pom.xml * /hbase/branches/0.98/pom.xml Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.098.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matteo Bertozzi updated HBASE-10480: Resolution: Fixed Fix Version/s: 0.94.17 0.99.0 0.96.2 0.98.0 Status: Resolved (was: Patch Available) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-5356) region_mover.rb can hang if table region it belongs to is deleted.
[ https://issues.apache.org/jira/browse/HBASE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894968#comment-13894968 ] Hudson commented on HBASE-5356: --- SUCCESS: Integrated in HBase-0.98 #139 (See [https://builds.apache.org/job/HBase-0.98/139/]) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565743) * /hbase/branches/0.98/bin/region_mover.rb region_mover.rb can hang if table region it belongs to is deleted. -- Key: HBASE-5356 URL: https://issues.apache.org/jira/browse/HBASE-5356 Project: HBase Issue Type: Bug Affects Versions: 0.90.3, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-5356.patch I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation). Here's the start of the relevent stack dump {code} 12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\ Verify_1328735001040,yC^P\xD7\x945\xD4,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\ 9) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\ ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\ java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171) at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326) at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535) at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133) at org.jruby.runtime.BlockBody.call(BlockBody.java:73) at org.jruby.runtime.Block.call(Block.java:89) at org.jruby.RubyProc.call(RubyProc.java:268) at org.jruby.RubyProc.call(RubyProc.java:228) at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205)
[jira] [Commented] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894969#comment-13894969 ] Hudson commented on HBASE-10313: SUCCESS: Integrated in HBase-0.98 #139 (See [https://builds.apache.org/job/HBase-0.98/139/]) HBASE-10313 Duplicate servlet-api jars in hbase 0.96.0 (stack: rev 1565731) * /hbase/branches/0.98/hbase-server/pom.xml * /hbase/branches/0.98/pom.xml Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.098.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10313) Duplicate servlet-api jars in hbase 0.96.0
[ https://issues.apache.org/jira/browse/HBASE-10313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894975#comment-13894975 ] Hudson commented on HBASE-10313: FAILURE: Integrated in hbase-0.96 #284 (See [https://builds.apache.org/job/hbase-0.96/284/]) HBASE-10313 Duplicate servlet-api jars in hbase 0.96.0 (stack: rev 1565732) * /hbase/branches/0.96/hbase-server/pom.xml * /hbase/branches/0.96/pom.xml Duplicate servlet-api jars in hbase 0.96.0 -- Key: HBASE-10313 URL: https://issues.apache.org/jira/browse/HBASE-10313 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Critical Fix For: 0.96.2, 0.98.1, 0.99.0 Attachments: 10313.txt, 10313v2.098.txt, 10313v2.txt On mailing list, http://search-hadoop.com/m/wtCkHs5Ujq, [~jerryhe] reports we have doubled jars: {code} [biadmin@hdtest009 lib]$ ls -l jsp-api* -rw-rw-r-- 1 biadmin biadmin 134910 Sep 17 01:13 jsp-api-2.1-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 100636 Sep 17 01:27 jsp-api-2.1.jar [biadmin@hdtest009 lib]$ ls -l servlet-api* -rw-rw-r-- 1 biadmin biadmin 132368 Sep 17 01:13 servlet-api-2.5-6.1.14.jar -rw-rw-r-- 1 biadmin biadmin 105112 Sep 17 01:12 servlet-api-2.5.jar {code} Fix in 0.96.2. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894755#comment-13894755 ] Ted Yu commented on HBASE-10413: {code} +@InterfaceAudience.Private +public class RegionSizeCalculator { {code} @InterfaceAudience.Public should be used. {code} + RegionSizeCalculator (HTable table, HBaseAdmin admin) throws IOException { {code} admin is only used in ctor. Close it in finally block. {code} + long regionSizeBytes = (memSize + fileSize) * megaByte; {code} Does memstore size have to be included ? {code} + LOG.debug(MessageFormat.format(Region {0} has size {1}, regionLoad.getNameAsString(), regionSizeBytes)); {code} Wrap long line - limit is 100 char. License header appears twice in RegionSizeCalculatorTest Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413.patch Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-3.patch code review Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-5356) region_mover.rb can hang if table region it belongs to is deleted.
[ https://issues.apache.org/jira/browse/HBASE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894773#comment-13894773 ] Jimmy Xiang commented on HBASE-5356: Good question. I put a break point before moving a region, then I tried to move a region after the table was either disabled or deleted. The script works well for these scenarios now. region_mover.rb can hang if table region it belongs to is deleted. -- Key: HBASE-5356 URL: https://issues.apache.org/jira/browse/HBASE-5356 Project: HBase Issue Type: Bug Affects Versions: 0.90.3, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Priority: Minor Attachments: hbase-5356.patch I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation). Here's the start of the relevent stack dump {code} 12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\ Verify_1328735001040,yC^P\xD7\x945\xD4,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\ 9) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\ ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\ java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171) at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326) at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535) at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133) at org.jruby.runtime.BlockBody.call(BlockBody.java:73) at org.jruby.runtime.Block.call(Block.java:89) at org.jruby.RubyProc.call(RubyProc.java:268) at org.jruby.RubyProc.call(RubyProc.java:228) at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137)
[jira] [Commented] (HBASE-10296) Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency
[ https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894862#comment-13894862 ] Enis Soztutar commented on HBASE-10296: --- Agreed that the most urgent thing is zk assignment. There has been at least 3 proposals so far for a master + assignment rewrite in HBASE-5487, and all want to get rid of zk and fix assignment. What I was trying to understand is about the deployment. I was assuming the RAFT quorum servers will be master processes as well. Currently it is sufficient to have 1 master, 1 backup and 3 zk servers for HA. With some master functionality implemented with RAFT but still using zk, we would need at least 3 zk servers, and 3 master servers for full HA, which is a change in the requirement for minimum HA setup. However, with the incremental approach, we might even implement RAFT quorum inside region server processes, so that we gradually get rid of the master role as well, and have only 1 type of server, where (2n+1) of them would act like masters (while still serving data). How do you imagine the typical small / medium sized deployment will be? Replace ZK with a consensus lib(paxos,zab or raft) running within master processes to provide better master failover performance and state consistency -- Key: HBASE-10296 URL: https://issues.apache.org/jira/browse/HBASE-10296 Project: HBase Issue Type: Brainstorming Components: master, Region Assignment, regionserver Reporter: Feng Honghua Currently master relies on ZK to elect active master, monitor liveness and store almost all of its states, such as region states, table info, replication info and so on. And zk also plays as a channel for master-regionserver communication(such as in region assigning) and client-regionserver communication(such as replication state/behavior change). But zk as a communication channel is fragile due to its one-time watch and asynchronous notification mechanism which together can leads to missed events(hence missed messages), for example the master must rely on the state transition logic's idempotence to maintain the region assigning state machine's correctness, actually almost all of the most tricky inconsistency issues can trace back their root cause to the fragility of zk as a communication channel. Replace zk with paxos running within master processes have following benefits: 1. better master failover performance: all master, either the active or the standby ones, have the same latest states in memory(except lag ones but which can eventually catch up later on). whenever the active master dies, the newly elected active master can immediately play its role without such failover work as building its in-memory states by consulting meta-table and zk. 2. better state consistency: master's in-memory states are the only truth about the system,which can eliminate inconsistency from the very beginning. and though the states are contained by all masters, paxos guarantees they are identical at any time. 3. more direct and simple communication pattern: client changes state by sending requests to master, master and regionserver talk directly to each other by sending request and response...all don't bother to using a third-party storage like zk which can introduce more uncertainty, worse latency and more complexity. 4. zk can only be used as liveness monitoring for determining if a regionserver is dead, and later on we can eliminate zk totally when we build heartbeat between master and regionserver. I know this might looks like a very crazy re-architect, but it deserves deep thinking and serious discussion for it, right? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894990#comment-13894990 ] Enis Soztutar commented on HBASE-10479: --- bq. This seems valid user method to me. There are three versions. The method isTableAvailable(TableName) is fine as user facing. Not the override isTableAvailable(TableName tableName, byte[][] splitKeys). You can see from the javadoc. bq. Already deprecated There are multiple methods named getRegionLocation(). I though one of them was not deprecated. Fine if you are sure every method that deals with region locations is deprecated. bq. There are constants. I mean please reduce the visibility for the constants if you can. They are internal afaik. bq. All the logic will move to ...Internal, and HCM will just call the internal. You want to do that? That would be good if we can do it. HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.02.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894991#comment-13894991 ] Hadoop QA commented on HBASE-10413: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627668/HBASE-10413-4.patch against trunk revision . ATTACHMENT ID: 12627668 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 5 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8632//console This message is automatically generated. Tablesplit.getLength returns 0 -- Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client, mapreduce Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec Assignee: Lukas Nalezenec Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, HBASE-10413-4.patch, HBASE-10413.patch InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10472) Manage the interruption in ZKUtil#getData
[ https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895011#comment-13895011 ] Nicolas Liochon commented on HBASE-10472: - bq. we can make ZkUtil methods interruptible one by one in the long term. Yep. Thanks for the review, I've committed to the .99. Manage the interruption in ZKUtil#getData - Key: HBASE-10472 URL: https://issues.apache.org/jira/browse/HBASE-10472 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.98.0, 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.98.1, 0.99.0 Attachments: 10472.v1.patch, 10472.v2.patch Many, but not all, methods in ZKUtil partly hides the interruption: they return null or something like that. Many times, this will result in a NPE, or something undefined. This jira is limited to the getData to keep things small enough (it's used in hbase-client code). The code is supposed to behave at least 'as well as before', or better (hopefully). It could be included in a later .98 release (98.1) and in .99 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData
[ https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10472: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Manage the interruption in ZKUtil#getData - Key: HBASE-10472 URL: https://issues.apache.org/jira/browse/HBASE-10472 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.98.0, 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.98.1, 0.99.0 Attachments: 10472.v1.patch, 10472.v2.patch Many, but not all, methods in ZKUtil partly hides the interruption: they return null or something like that. Many times, this will result in a NPE, or something undefined. This jira is limited to the getData to keep things small enough (it's used in hbase-client code). The code is supposed to behave at least 'as well as before', or better (hopefully). It could be included in a later .98 release (98.1) and in .99 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10472) Manage the interruption in ZKUtil#getData
[ https://issues.apache.org/jira/browse/HBASE-10472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicolas Liochon updated HBASE-10472: Fix Version/s: (was: 0.98.1) Manage the interruption in ZKUtil#getData - Key: HBASE-10472 URL: https://issues.apache.org/jira/browse/HBASE-10472 Project: HBase Issue Type: Bug Components: Client, master, regionserver Affects Versions: 0.98.0, 0.99.0 Reporter: Nicolas Liochon Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10472.v1.patch, 10472.v2.patch Many, but not all, methods in ZKUtil partly hides the interruption: they return null or something like that. Many times, this will result in a NPE, or something undefined. This jira is limited to the getData to keep things small enough (it's used in hbase-client code). The code is supposed to behave at least 'as well as before', or better (hopefully). It could be included in a later .98 release (98.1) and in .99 -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895037#comment-13895037 ] Hudson commented on HBASE-10480: SUCCESS: Integrated in HBase-0.94-security #403 (See [https://builds.apache.org/job/HBase-0.94-security/403/]) HBASE-10480 TestLogRollPeriod#testWithEdits may fail due to insufficient waiting (mbertozzi: rev 1565776) * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollPeriod.java TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895032#comment-13895032 ] stack commented on HBASE-10479: --- HConnection is a confusing name, yeah. Wasn't suggesting rename that. Was suggesting name for the new Interface. I think adding an 'Internal' suffix crass but that may just be me. HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.02.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10439) Document how to configure REST server impersonation
[ https://issues.apache.org/jira/browse/HBASE-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895048#comment-13895048 ] stack commented on HBASE-10439: --- +1 Document how to configure REST server impersonation --- Key: HBASE-10439 URL: https://issues.apache.org/jira/browse/HBASE-10439 Project: HBase Issue Type: Task Components: documentation Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.99.0 Attachments: hbase-10439.patch In 0.96, REST server supports impersonation. Let's document how to configure it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10439) Document how to configure REST server impersonation
[ https://issues.apache.org/jira/browse/HBASE-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang updated HBASE-10439: Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Integrated into trunk. Thanks. Document how to configure REST server impersonation --- Key: HBASE-10439 URL: https://issues.apache.org/jira/browse/HBASE-10439 Project: HBase Issue Type: Task Components: documentation Reporter: Jimmy Xiang Assignee: Jimmy Xiang Priority: Minor Fix For: 0.99.0 Attachments: hbase-10439.patch In 0.96, REST server supports impersonation. Let's document how to configure it. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895050#comment-13895050 ] Hadoop QA commented on HBASE-10479: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12627672/HBASE-10479.02.patch against trunk revision . ATTACHMENT ID: 12627672 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop1.0{color}. The patch compiles against the hadoop 1.0 profile. {color:green}+1 hadoop1.1{color}. The patch compiles against the hadoop 1.1 profile. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:red}-1 site{color}. The patch appears to cause mvn site goal to fail. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/8633//console This message is automatically generated. HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.02.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895104#comment-13895104 ] Aleksandr Shulman commented on HBASE-10481: --- Semantically, it does not make sense to have a the previous version be greater than the current version. The script would just generate a report that is the mirror image (adds reported as removes). I don't think this is a meaningful use case to support. The solution would be to add a meaningful error message and also to document the logic. API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.94.16, 0.98.1, 0.99.0, 0.96.1.1 [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895108#comment-13895108 ] Hudson commented on HBASE-10480: FAILURE: Integrated in HBase-0.94-JDK7 #41 (See [https://builds.apache.org/job/HBase-0.94-JDK7/41/]) HBASE-10480 TestLogRollPeriod#testWithEdits may fail due to insufficient waiting (mbertozzi: rev 1565776) * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollPeriod.java TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10340) [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895109#comment-13895109 ] stack commented on HBASE-10340: --- Patch works nicely for me. Brought up RS w/ this patch applied and it played nicely in a cluster where others did not have the patch applied. Then brought up new master. That worked. Then I started second regionserver on same node and master was able to see two RS and from the UI I could click through to both. Let me apply to 0.94. [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node -- Key: HBASE-10340 URL: https://issues.apache.org/jira/browse/HBASE-10340 Project: HBase Issue Type: Bug Reporter: stack Fix For: 0.96.2, 0.94.17 Backport this patch after testing it does not break anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10484) Backport parent task to 0.94 RegionServer links in table.jsp broken
stack created HBASE-10484: - Summary: Backport parent task to 0.94 RegionServer links in table.jsp broken Key: HBASE-10484 URL: https://issues.apache.org/jira/browse/HBASE-10484 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack Fix For: 0.94.17 HBASE-10340 breaks table.jsp rendoring. Need to backport HBASE-10334 now HBASE-10340 was backported to 0.94. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-5356) region_mover.rb can hang if table region it belongs to is deleted.
[ https://issues.apache.org/jira/browse/HBASE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895122#comment-13895122 ] Hudson commented on HBASE-5356: --- SUCCESS: Integrated in hbase-0.96 #285 (See [https://builds.apache.org/job/hbase-0.96/285/]) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565744) * /hbase/branches/0.96/bin/region_mover.rb region_mover.rb can hang if table region it belongs to is deleted. -- Key: HBASE-5356 URL: https://issues.apache.org/jira/browse/HBASE-5356 Project: HBase Issue Type: Bug Affects Versions: 0.90.3, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-5356.patch I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation). Here's the start of the relevent stack dump {code} 12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\ Verify_1328735001040,yC^P\xD7\x945\xD4,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\ 9) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\ ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\ java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171) at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326) at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535) at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133) at org.jruby.runtime.BlockBody.call(BlockBody.java:73) at org.jruby.runtime.Block.call(Block.java:89) at org.jruby.RubyProc.call(RubyProc.java:268) at org.jruby.RubyProc.call(RubyProc.java:228) at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895121#comment-13895121 ] Hudson commented on HBASE-10480: SUCCESS: Integrated in hbase-0.96 #285 (See [https://builds.apache.org/job/hbase-0.96/285/]) HBASE-10480 TestLogRollPeriod#testWithEdits may fail due to insufficient waiting (mbertozzi: rev 1565772) * /hbase/branches/0.96/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollPeriod.java TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10481: -- Attachment: HBASE-10481-v1.patch Adding v1 of the patch. Fixes the case identified in the jira and also corrects some of the output about where the working directory is. API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.94.16, 0.98.1, 0.99.0, 0.96.1.1 Attachments: HBASE-10481-v1.patch [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-10481 started by Aleksandr Shulman. API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.98.0, 0.94.16, 0.99.0, 0.96.1.1 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.94.16, 0.98.1, 0.99.0, 0.96.1.1 Attachments: HBASE-10481-v1.patch [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10340) [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10340: -- Assignee: stack Committed to 0.94 and to 0.96. Resolving. I don't think 0.94 needs the ancillary HBASE-10334. Checking in HBASE-10484 [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node -- Key: HBASE-10340 URL: https://issues.apache.org/jira/browse/HBASE-10340 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.96.2, 0.94.17 Backport this patch after testing it does not break anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10484) Backport parent task to 0.94 RegionServer links in table.jsp broken
[ https://issues.apache.org/jira/browse/HBASE-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-10484. --- Resolution: Not A Problem Fix Version/s: (was: 0.94.17) Resolving as not a problem. UI seems fine w/o the backport (the changes seem to be present in 0.94 already looking at the patch). Backport parent task to 0.94 RegionServer links in table.jsp broken - Key: HBASE-10484 URL: https://issues.apache.org/jira/browse/HBASE-10484 Project: HBase Issue Type: Sub-task Reporter: stack Assignee: stack HBASE-10340 breaks table.jsp rendoring. Need to backport HBASE-10334 now HBASE-10340 was backported to 0.94. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10481) API Compatibility JDiff script does not properly handle arguments in reverse order
[ https://issues.apache.org/jira/browse/HBASE-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-10481: -- Status: Patch Available (was: In Progress) API Compatibility JDiff script does not properly handle arguments in reverse order -- Key: HBASE-10481 URL: https://issues.apache.org/jira/browse/HBASE-10481 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.96.1.1, 0.94.16, 0.98.0, 0.99.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.98.1, 0.99.0, 0.96.1.1, 0.94.16 Attachments: HBASE-10481-v1.patch [~jmhsieh] found an issue when doing a diff between a pre-0.96 branch and a post-0.96 branch. Typically, if the pre-0.96 branch is specified first, and the post-0.96 branch second, the exisitng logic handles it. When it is in the reverse order, that logic is not handled properly. The fix should address this. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HBASE-10340) [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node
[ https://issues.apache.org/jira/browse/HBASE-10340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-10340. --- Resolution: Fixed Resolving. [BACKPORT] HBASE-9892 Add info port to ServerName to support multi instances in a node -- Key: HBASE-10340 URL: https://issues.apache.org/jira/browse/HBASE-10340 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Fix For: 0.96.2, 0.94.17 Backport this patch after testing it does not break anything. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10185) HBaseClient retries even though a DoNotRetryException was thrown
[ https://issues.apache.org/jira/browse/HBASE-10185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895140#comment-13895140 ] Nicolas Liochon commented on HBASE-10185: - There is actually an issue with the v2 (unrelated to the precommit issue): in waitForWork, we have some special conditions on the calls.empty(), and we can now trigger them. I will fix this. HBaseClient retries even though a DoNotRetryException was thrown Key: HBASE-10185 URL: https://issues.apache.org/jira/browse/HBASE-10185 Project: HBase Issue Type: Bug Components: IPC/RPC Affects Versions: 0.94.12, 0.99.0 Reporter: Samarth Assignee: Nicolas Liochon Fix For: 0.99.0 Attachments: 10185.v1.patch, 10185.v2.patch Throwing a DoNotRetryIOException inside Writable.write(Dataoutput) method doesn't prevent HBase from retrying. Debugging the code locally, I figured that the bug lies in the way HBaseClient simply throws an IOException when it sees that a connection has been closed unexpectedly. Method: public Writable call(Writable param, InetSocketAddress addr, Class? extends VersionedProtocol protocol, User ticket, int rpcTimeout) Excerpt of code where the bug is present: while (!call.done) { if (connection.shouldCloseConnection.get()) { throw new IOException(Unexpected closed connection); } Throwing this IOException causes the ServerCallable.translateException(t) to be a no-op resulting in HBase retrying. From my limited view and understanding of the code, one way I could think of handling this is by looking at the closeConnection member variable of a connection to determine what kind of exception should be thrown. Specifically, when a connection is closed, the current code does this: protected synchronized void markClosed(IOException e) { if (shouldCloseConnection.compareAndSet(false, true)) { closeException = e; notifyAll(); } } Within HBaseClient's call method, the code could possibly be modified to: while (!call.done) { if (connection.shouldCloseConnection.get() ) { if(connection.closeException instanceof DoNotRetryIOException) { throw closeException; } throw new IOException(Unexpected closed connection); } -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10485) PrefixFilter#filterKeyValue() should perform filtering on row key
Ted Yu created HBASE-10485: -- Summary: PrefixFilter#filterKeyValue() should perform filtering on row key Key: HBASE-10485 URL: https://issues.apache.org/jira/browse/HBASE-10485 Project: HBase Issue Type: Bug Reporter: Ted Yu Attachments: 10485-0.94.txt Niels reported an issue under the thread 'Trouble writing custom filter for use in FilterList' where his custom filter used in FilterList along with PrefixFilter produced an unexpected results. His test can be found here: https://github.com/nielsbasjes/HBase-filter-problem This is due to PrefixFilter#filterKeyValue() using FilterBase#filterKeyValue() which returns ReturnCode.INCLUDE When FilterList.Operator.MUST_PASS_ONE is specified, FilterList#filterKeyValue() would return ReturnCode.INCLUDE even when row key prefix doesn't match meanwhile the other filter's filterKeyValue() returns ReturnCode.NEXT_COL -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10485) PrefixFilter#filterKeyValue() should perform filtering on row key
[ https://issues.apache.org/jira/browse/HBASE-10485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10485: --- Attachment: 10485-0.94.txt Patch for 0.94 All Filter related tests pass. PrefixFilter#filterKeyValue() should perform filtering on row key - Key: HBASE-10485 URL: https://issues.apache.org/jira/browse/HBASE-10485 Project: HBase Issue Type: Bug Reporter: Ted Yu Attachments: 10485-0.94.txt Niels reported an issue under the thread 'Trouble writing custom filter for use in FilterList' where his custom filter used in FilterList along with PrefixFilter produced an unexpected results. His test can be found here: https://github.com/nielsbasjes/HBase-filter-problem This is due to PrefixFilter#filterKeyValue() using FilterBase#filterKeyValue() which returns ReturnCode.INCLUDE When FilterList.Operator.MUST_PASS_ONE is specified, FilterList#filterKeyValue() would return ReturnCode.INCLUDE even when row key prefix doesn't match meanwhile the other filter's filterKeyValue() returns ReturnCode.NEXT_COL -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895143#comment-13895143 ] Hudson commented on HBASE-10480: FAILURE: Integrated in HBase-0.94 #1276 (See [https://builds.apache.org/job/HBase-0.94/1276/]) HBASE-10480 TestLogRollPeriod#testWithEdits may fail due to insufficient waiting (mbertozzi: rev 1565776) * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollPeriod.java TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10482) ReplicationSyncUp doesn't clean up its ZK, needed for tests
[ https://issues.apache.org/jira/browse/HBASE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-10482: -- Fix Version/s: (was: 0.96.2) I committed the patch to 0.96, 0.98, and to trunk. Leaving open. We can close later if it fixes the issue. ReplicationSyncUp doesn't clean up its ZK, needed for tests --- Key: HBASE-10482 URL: https://issues.apache.org/jira/browse/HBASE-10482 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.1, 0.94.16 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.98.1, 0.99.0, 0.94.17 Attachments: HBASE-10249.patch TestReplicationSyncUpTool failed again: https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSyncUpTool/testSyncUpTool/ It's not super obvious why only one of the two tables is replicated, the test could use some more logging, but I understand it this way: The first ReplicationSyncUp gets started and for some reason it cannot replicate the data: {noformat} 2014-02-06 21:32:19,811 INFO [Thread-1372] regionserver.ReplicationSourceManager(203): Current list of replicators: [1391722339091.SyncUpTool.replication.org,1234,1, quirinus.apache.org,37045,1391722237951, quirinus.apache.org,33502,1391722238125] other RSs: [] 2014-02-06 21:32:19,811 INFO [Thread-1372.replicationSource,1] regionserver.ReplicationSource(231): Replicating db42e7fc-7f29-4038-9292-d85ea8b9994b - 783c0ab2-4ff9-4dc0-bb38-86bf31d1d817 2014-02-06 21:32:19,892 TRACE [Thread-1372.replicationSource,2] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 1 2014-02-06 21:32:19,911 TRACE [Thread-1372.replicationSource,1] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 1 2014-02-06 21:32:20,094 TRACE [Thread-1372.replicationSource,2] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 2 ... 2014-02-06 21:32:23,414 TRACE [Thread-1372.replicationSource,1] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 8 2014-02-06 21:32:23,673 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(169): Moving quirinus.apache.org,37045,1391722237951's hlogs to my queue 2014-02-06 21:32:23,768 DEBUG [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(396): Creating quirinus.apache.org%2C37045%2C1391722237951.1391722243779 with data 10803 2014-02-06 21:32:23,842 DEBUG [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(396): Creating quirinus.apache.org%2C37045%2C1391722237951.1391722243779 with data 10803 2014-02-06 21:32:24,297 TRACE [Thread-1372.replicationSource,2] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 9 2014-02-06 21:32:24,314 TRACE [Thread-1372.replicationSource,1] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 9 {noformat} Finally it gives up: {noformat} 2014-02-06 21:32:30,873 DEBUG [Thread-1372] replication.TestReplicationSyncUpTool(323): SyncUpAfterDelete failed at retry = 0, with rowCount_ht1TargetPeer1 =100 and rowCount_ht2TargetAtPeer1 =200 {noformat} The syncUp tool has an ID you can follow, grep for syncupReplication1391722338885 or just the timestamp, and you can see it doing things after that. The reason is that the tool closes the ReplicationSourceManager but not the ZK connection, so events _still_ come in and NodeFailoverWorker _still_ tries to recover queues but then there's nothing to process them. Later in the logs you can see: {noformat} 2014-02-06 21:32:37,381 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(169): Moving quirinus.apache.org,33502,1391722238125's hlogs to my queue 2014-02-06 21:32:37,567 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(239): Won't transfer the queue, another RS took care of it because of: KeeperErrorCode = NoNode for /1/replication/rs/quirinus.apache.org,33502,1391722238125/lock {noformat} There shouldn't' be any racing, but now someone already moved quirinus.apache.org,33502,1391722238125 away. FWIW I can't even make the test fail on my machine so I'm not 100% sure closing the ZK connection fixes the issue, but at least it's the right thing to do. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10482) ReplicationSyncUp doesn't clean up its ZK, needed for tests
[ https://issues.apache.org/jira/browse/HBASE-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895166#comment-13895166 ] Demai Ni commented on HBASE-10482: -- +1 from me. [~jdcryans], thank you very much. ReplicationSyncUp doesn't clean up its ZK, needed for tests --- Key: HBASE-10482 URL: https://issues.apache.org/jira/browse/HBASE-10482 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.1, 0.94.16 Reporter: Jean-Daniel Cryans Assignee: Jean-Daniel Cryans Fix For: 0.98.1, 0.99.0, 0.94.17 Attachments: HBASE-10249.patch TestReplicationSyncUpTool failed again: https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationSyncUpTool/testSyncUpTool/ It's not super obvious why only one of the two tables is replicated, the test could use some more logging, but I understand it this way: The first ReplicationSyncUp gets started and for some reason it cannot replicate the data: {noformat} 2014-02-06 21:32:19,811 INFO [Thread-1372] regionserver.ReplicationSourceManager(203): Current list of replicators: [1391722339091.SyncUpTool.replication.org,1234,1, quirinus.apache.org,37045,1391722237951, quirinus.apache.org,33502,1391722238125] other RSs: [] 2014-02-06 21:32:19,811 INFO [Thread-1372.replicationSource,1] regionserver.ReplicationSource(231): Replicating db42e7fc-7f29-4038-9292-d85ea8b9994b - 783c0ab2-4ff9-4dc0-bb38-86bf31d1d817 2014-02-06 21:32:19,892 TRACE [Thread-1372.replicationSource,2] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 1 2014-02-06 21:32:19,911 TRACE [Thread-1372.replicationSource,1] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 1 2014-02-06 21:32:20,094 TRACE [Thread-1372.replicationSource,2] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 2 ... 2014-02-06 21:32:23,414 TRACE [Thread-1372.replicationSource,1] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 8 2014-02-06 21:32:23,673 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(169): Moving quirinus.apache.org,37045,1391722237951's hlogs to my queue 2014-02-06 21:32:23,768 DEBUG [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(396): Creating quirinus.apache.org%2C37045%2C1391722237951.1391722243779 with data 10803 2014-02-06 21:32:23,842 DEBUG [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(396): Creating quirinus.apache.org%2C37045%2C1391722237951.1391722243779 with data 10803 2014-02-06 21:32:24,297 TRACE [Thread-1372.replicationSource,2] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 9 2014-02-06 21:32:24,314 TRACE [Thread-1372.replicationSource,1] regionserver.ReplicationSource(596): No log to process, sleeping 100 times 9 {noformat} Finally it gives up: {noformat} 2014-02-06 21:32:30,873 DEBUG [Thread-1372] replication.TestReplicationSyncUpTool(323): SyncUpAfterDelete failed at retry = 0, with rowCount_ht1TargetPeer1 =100 and rowCount_ht2TargetAtPeer1 =200 {noformat} The syncUp tool has an ID you can follow, grep for syncupReplication1391722338885 or just the timestamp, and you can see it doing things after that. The reason is that the tool closes the ReplicationSourceManager but not the ZK connection, so events _still_ come in and NodeFailoverWorker _still_ tries to recover queues but then there's nothing to process them. Later in the logs you can see: {noformat} 2014-02-06 21:32:37,381 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(169): Moving quirinus.apache.org,33502,1391722238125's hlogs to my queue 2014-02-06 21:32:37,567 INFO [ReplicationExecutor-0] replication.ReplicationQueuesZKImpl(239): Won't transfer the queue, another RS took care of it because of: KeeperErrorCode = NoNode for /1/replication/rs/quirinus.apache.org,33502,1391722238125/lock {noformat} There shouldn't' be any racing, but now someone already moved quirinus.apache.org,33502,1391722238125 away. FWIW I can't even make the test fail on my machine so I'm not 100% sure closing the ZK connection fixes the issue, but at least it's the right thing to do. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10479) HConnection interface is public but is used internally, and contains a bunch of methods
[ https://issues.apache.org/jira/browse/HBASE-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-10479: - Attachment: HBASE-10479.03.patch HConnectionManager moved to ConnectionManagerInternal, HCM made a thin public wrapper. I removed or reprecated a lot in HCM. Several create/delete/getConnection methods were already deprecated, I frankly have no idea why, they all look similarly valid for me. Lots of changes all over the place due to extensive usage of internals esp. in tests. The class has to be public, although only two methods used outside client package are setServerSideHConnectionRetries, and inject nonce generator, maybe these should be in yet another wrapper. HConnection interface is public but is used internally, and contains a bunch of methods --- Key: HBASE-10479 URL: https://issues.apache.org/jira/browse/HBASE-10479 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-10479.01.patch, HBASE-10479.02.patch, HBASE-10479.03.patch, HBASE-10479.patch HConnection has too many methods for a public interface, and some of these should not be public. It is used extensively for internal purposes, so we keep adding methods to it that may not make sense for public interface. The idea is to create a separate internal interface inheriting HConnection, copy some methods to it and deprecate them on HConnection. New methods for internal use would be added to new interface; the deprecated methods would eventually be removed from public interface. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10480) TestLogRollPeriod#testWithEdits may fail due to insufficient waiting
[ https://issues.apache.org/jira/browse/HBASE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895174#comment-13895174 ] Hudson commented on HBASE-10480: SUCCESS: Integrated in HBase-TRUNK #4899 (See [https://builds.apache.org/job/HBase-TRUNK/4899/]) HBASE-10480 TestLogRollPeriod#testWithEdits may fail due to insufficient waiting (mbertozzi: rev 1565770) * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRollPeriod.java TestLogRollPeriod#testWithEdits may fail due to insufficient waiting Key: HBASE-10480 URL: https://issues.apache.org/jira/browse/HBASE-10480 Project: HBase Issue Type: Test Components: test Reporter: Ted Yu Assignee: Matteo Bertozzi Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0, 0.94.17 Attachments: HBASE-10480-v0.patch The test waits for minRolls rolls by sleeping: {code} Thread.sleep((minRolls + 1) * LOG_ROLL_PERIOD); {code} However, the above wait period may not be sufficient. See https://builds.apache.org/job/HBase-TRUNK/4895/testReport/junit/org.apache.hadoop.hbase.regionserver.wal/TestLogRollPeriod/testWithEdits/ : {code} 2014-02-06 23:02:25,710 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed ... 2014-02-06 23:02:30,275 DEBUG [RS:0;quirinus:56476.logRoller] regionserver.LogRoller(87): Hlog roll period 4000ms elapsed {code} The interval between two successive periodic rolls was ~1.5s longer than LOG_ROLL_PERIOD (4s) 1.5s * 4 (minRolls-1) 4s (LOG_ROLL_PERIOD) This led to the test failure: {code} java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertFalse(Assert.java:64) at org.junit.Assert.assertFalse(Assert.java:74) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.checkMinLogRolls(TestLogRollPeriod.java:168) at org.apache.hadoop.hbase.regionserver.wal.TestLogRollPeriod.testWithEdits(TestLogRollPeriod.java:130) {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-5356) region_mover.rb can hang if table region it belongs to is deleted.
[ https://issues.apache.org/jira/browse/HBASE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895175#comment-13895175 ] Hudson commented on HBASE-5356: --- SUCCESS: Integrated in HBase-TRUNK #4899 (See [https://builds.apache.org/job/HBase-TRUNK/4899/]) HBASE-5356 region_mover.rb can hang if table region it belongs to is deleted (jxiang: rev 1565742) * /hbase/trunk/bin/region_mover.rb region_mover.rb can hang if table region it belongs to is deleted. -- Key: HBASE-5356 URL: https://issues.apache.org/jira/browse/HBASE-5356 Project: HBase Issue Type: Bug Affects Versions: 0.90.3, 0.92.0, 0.94.0 Reporter: Jonathan Hsieh Assignee: Jimmy Xiang Priority: Minor Fix For: 0.98.0, 0.96.2, 0.99.0 Attachments: hbase-5356.patch I was testing the region_mover.rb script on a loaded hbase and noticed that it can hang (thus hanging graceful shutdown) if a region that it is attempting to move gets deleted (by a table delete operation). Here's the start of the relevent stack dump {code} 12/02/08 13:27:13 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: TestLoadAndVerify_1328735001040, row=TestLoadAnd\ Verify_1328735001040,yC^P\xD7\x945\xD4,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:136) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:95) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.prefetchRegionCache(HConnectionManager.java:64\ 9) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:703\ ) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:565) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416) at org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57) at org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.\ java:1018) at org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104) at org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535) at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.jruby.javasupport.JavaMethod.invokeDirectWithExceptionHandling(JavaMethod.java:525) at org.jruby.javasupport.JavaMethod.invokeDirect(JavaMethod.java:380) at org.jruby.java.invokers.InstanceMethodInvoker.call(InstanceMethodInvoker.java:58) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:137) at usr.lib.hbase.bin.region_mover.method__7$RUBY$isSuccessfulScan(/usr/lib/hbase/bin/region_mover.rb:133) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at usr$lib$hbase$bin$region_mover#method__7$RUBY$isSuccessfulScan.call(usr$lib$hbase$bin$region_mover#method__7$RUBY$isSucces\ sfulScan:65535) at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:171) at usr.lib.hbase.bin.region_mover.block_4$RUBY$__for__(/usr/lib/hbase/bin/region_mover.rb:326) at usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__.call(usr$lib$hbase$bin$region_mover#block_4$RUBY$__for__:65535) at org.jruby.runtime.CompiledBlock.yield(CompiledBlock.java:133) at org.jruby.runtime.BlockBody.call(BlockBody.java:73) at org.jruby.runtime.Block.call(Block.java:89) at org.jruby.RubyProc.call(RubyProc.java:268) at org.jruby.RubyProc.call(RubyProc.java:228) at org.jruby.RubyProc$i$0$0$call.call(RubyProc$i$0$0$call.gen:65535) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:209) at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:205) at