[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529729#comment-13529729 ] Matt Corgan commented on HBASE-7233: {quote}I don't follow unless you are saying I should just use COS in place of Encoder{quote}argh, i guess i don't have a better solution. Was thinking encoder implementations implement the CellOutputStream, but the exceptions complicate it. Would it be too weird to have CellOutputStream extend Encoder, adding the IOException? {quote}I want a random seeker Interface. This looks like it has what we'd need.{quote}It's designed to be that, but might have a few more methods than hbase currently needs. After some confusion, I found the EncodedSeeker does almost everything it needs with just positionAtOrBefore(Cell key). {quote}Ok on the vints... ugh{quote} fyi - on the UVintTool, there's a method that pulls the value off an InputStream without allocating objects: UVIntTool.getInt(InputStream is). However, I think i'd recommend sticking to well-known hadoop formats for the basic RPC stuff. If people actually write high performance clients in other languages they would have to read/write these formats. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads
[ https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529730#comment-13529730 ] Lars Hofhansl commented on HBASE-7336: -- bq. Compactions should go get their own Reader? That sounds like a save and important improvement. In other cases it actually seems best to try to get a stream and fall back to pread if that fails. Could drive # of reader by he size of the store file, something like a reader per n GB (n = 1 or 2 maybe). Then we round robin the readers. Should I commit this for now (assuming it passes HadoopQA and no objections), and we investigate other options further? Or discuss a bit more to see if we kind other options? HFileBlock.readAtOffset does not work well with multiple threads Key: HBASE-7336 URL: https://issues.apache.org/jira/browse/HBASE-7336 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: 7336-0.94.txt, 7336-0.96.txt HBase grinds to a halt when many threads scan along the same set of blocks and neither read short circuit is nor block caching is enabled for the dfs client ... disabling the block cache makes sense on very large scans. It turns out that synchronizing in istream in HFileBlock.readAtOffset is the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529742#comment-13529742 ] Gabriel Reid commented on HBASE-7325: - I've tested it against the TestReplication unit tests, as well as doing some additional testing with HBaseTestingUtility to verify the expected performance improvement. Indeed, I think the once-per-second load being put on the namenode should be a non-issue, and worth it for the gain that you get with faster replication on a quiet cluster. Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data
[ https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529746#comment-13529746 ] ramkrishna.s.vasudevan commented on HBASE-7337: --- Did you check with your values? Like the inserted values are also String and the one that you are querying is also String? Just to verify... SingleColumnValueFilter seems to get unavailble data Key: HBASE-7337 URL: https://issues.apache.org/jira/browse/HBASE-7337 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.3, 0.96.0 Environment: 0.94 Reporter: Zhou wenjian Assignee: Zhou wenjian Fix For: 0.96.0, 0.94.4 put multi versions of a row. r1 cf:q version:1 value:1 r1 cf:q version:2 value:3 r1 cf:q version:3 value:2 the filter in scan is set as below: SingleColumnValueFilter valueF = new SingleColumnValueFilter( family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes .toBytes(2))); then i found all of the three versions will be emmitted, then i set latestVersionOnly to false, the result does no change. public ReturnCode filterKeyValue(KeyValue keyValue) { // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + Bytes.toString(keyValue.getValue())); if (this.matchedColumn) { // We already found and matched the single column, all keys now pass return ReturnCode.INCLUDE; } else if (this.latestVersionOnly this.foundColumn) { // We found but did not match the single column, skip to next row return ReturnCode.NEXT_ROW; } if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) { return ReturnCode.INCLUDE; } foundColumn = true; if (filterColumnValue(keyValue.getBuffer(), keyValue.getValueOffset(), keyValue.getValueLength())) { return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE; } this.matchedColumn = true; return ReturnCode.INCLUDE; } From the code above, it seeems that version 3 will be first emmited, and set matchedColumn to false, which leads the following version 2 and 1 emmited too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7328) IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove
[ https://issues.apache.org/jira/browse/HBASE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529751#comment-13529751 ] Hudson commented on HBASE-7328: --- Integrated in HBase-0.94 #622 (See [https://builds.apache.org/job/HBase-0.94/622/]) HBASE-7328 IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove (Revision 1420545) Result = SUCCESS stack : Files : * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/IntegrationTestRebalanceAndKillServers.java IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove Key: HBASE-7328 URL: https://issues.apache.org/jira/browse/HBASE-7328 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7328-v0.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data
[ https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhou wenjian updated HBASE-7337: Description: put multi versions of a row. r1 cf:q version:1 value:1 r1 cf:q version:2 value:3 r1 cf:q version:3 value:2 the filter in scan is set as below: SingleColumnValueFilter valueF = new SingleColumnValueFilter( family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes .toBytes(2))); then i found all of the three versions will be emmitted, then i set latestVersionOnly to false, the result does no change. public ReturnCode filterKeyValue(KeyValue keyValue) { // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + Bytes.toString(keyValue.getValue())); if (this.matchedColumn) { // We already found and matched the single column, all keys now pass return ReturnCode.INCLUDE; } else if (this.latestVersionOnly this.foundColumn) { // We found but did not match the single column, skip to next row return ReturnCode.NEXT_ROW; } if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) { return ReturnCode.INCLUDE; } foundColumn = true; if (filterColumnValue(keyValue.getBuffer(), keyValue.getValueOffset(), keyValue.getValueLength())) { return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE; } this.matchedColumn = true; return ReturnCode.INCLUDE; } From the code above, it seeems that version 3 will be first emmited, and set matchedColumn to true, which leads the following version 2 and 1 emmited too. was: put multi versions of a row. r1 cf:q version:1 value:1 r1 cf:q version:2 value:3 r1 cf:q version:3 value:2 the filter in scan is set as below: SingleColumnValueFilter valueF = new SingleColumnValueFilter( family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes .toBytes(2))); then i found all of the three versions will be emmitted, then i set latestVersionOnly to false, the result does no change. public ReturnCode filterKeyValue(KeyValue keyValue) { // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + Bytes.toString(keyValue.getValue())); if (this.matchedColumn) { // We already found and matched the single column, all keys now pass return ReturnCode.INCLUDE; } else if (this.latestVersionOnly this.foundColumn) { // We found but did not match the single column, skip to next row return ReturnCode.NEXT_ROW; } if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) { return ReturnCode.INCLUDE; } foundColumn = true; if (filterColumnValue(keyValue.getBuffer(), keyValue.getValueOffset(), keyValue.getValueLength())) { return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE; } this.matchedColumn = true; return ReturnCode.INCLUDE; } From the code above, it seeems that version 3 will be first emmited, and set matchedColumn to false, which leads the following version 2 and 1 emmited too. SingleColumnValueFilter seems to get unavailble data Key: HBASE-7337 URL: https://issues.apache.org/jira/browse/HBASE-7337 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.3, 0.96.0 Environment: 0.94 Reporter: Zhou wenjian Assignee: Zhou wenjian Fix For: 0.96.0, 0.94.4 put multi versions of a row. r1 cf:q version:1 value:1 r1 cf:q version:2 value:3 r1 cf:q version:3 value:2 the filter in scan is set as below: SingleColumnValueFilter valueF = new SingleColumnValueFilter( family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes .toBytes(2))); then i found all of the three versions will be emmitted, then i set latestVersionOnly to false, the result does no change. public ReturnCode filterKeyValue(KeyValue keyValue) { // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + Bytes.toString(keyValue.getValue())); if (this.matchedColumn) { // We already found and matched the single column, all keys now pass return ReturnCode.INCLUDE; } else if (this.latestVersionOnly this.foundColumn) { // We found but did not match the single column, skip to next row return ReturnCode.NEXT_ROW; } if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) { return ReturnCode.INCLUDE; } foundColumn = true; if (filterColumnValue(keyValue.getBuffer(), keyValue.getValueOffset(), keyValue.getValueLength())) { return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE; } this.matchedColumn = true; return ReturnCode.INCLUDE; } From the code above, it seeems that version 3 will be first
[jira] [Commented] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data
[ https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529773#comment-13529773 ] Zhou wenjian commented on HBASE-7337: - they are both String SingleColumnValueFilter seems to get unavailble data Key: HBASE-7337 URL: https://issues.apache.org/jira/browse/HBASE-7337 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.3, 0.96.0 Environment: 0.94 Reporter: Zhou wenjian Assignee: Zhou wenjian Fix For: 0.96.0, 0.94.4 put multi versions of a row. r1 cf:q version:1 value:1 r1 cf:q version:2 value:3 r1 cf:q version:3 value:2 the filter in scan is set as below: SingleColumnValueFilter valueF = new SingleColumnValueFilter( family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes .toBytes(2))); then i found all of the three versions will be emmitted, then i set latestVersionOnly to false, the result does no change. public ReturnCode filterKeyValue(KeyValue keyValue) { // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + Bytes.toString(keyValue.getValue())); if (this.matchedColumn) { // We already found and matched the single column, all keys now pass return ReturnCode.INCLUDE; } else if (this.latestVersionOnly this.foundColumn) { // We found but did not match the single column, skip to next row return ReturnCode.NEXT_ROW; } if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) { return ReturnCode.INCLUDE; } foundColumn = true; if (filterColumnValue(keyValue.getBuffer(), keyValue.getValueOffset(), keyValue.getValueLength())) { return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE; } this.matchedColumn = true; return ReturnCode.INCLUDE; } From the code above, it seeems that version 3 will be first emmited, and set matchedColumn to true, which leads the following version 2 and 1 emmited too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.
[ https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529782#comment-13529782 ] Hadoop QA commented on HBASE-7331: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560494/HBASE-7331_94.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3491//console This message is automatically generated. Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server. -- Key: HBASE-7331 URL: https://issues.apache.org/jira/browse/HBASE-7331 Project: HBase Issue Type: Sub-task Components: regionserver, security Affects Versions: 0.94.3, 0.96.0 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.94.3, 0.96.0 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk.patch The following APIs in HRegionServer are either missing hooks to coprocessor or the hooks are not implemented in the AccessController class for security. As a result any unauthorized user can: 1.Open a region 2. Close a region 3. Stop region server 4. Lock a row 5. Unlock a row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads
[ https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529785#comment-13529785 ] Hadoop QA commented on HBASE-7336: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560514/7336-0.96.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 104 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestMultiParallel Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3490//console This message is automatically generated. HFileBlock.readAtOffset does not work well with multiple threads Key: HBASE-7336 URL: https://issues.apache.org/jira/browse/HBASE-7336 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: 7336-0.94.txt, 7336-0.96.txt HBase grinds to a halt when many threads scan along the same set of blocks and neither read short circuit is nor block caching is enabled for the dfs client ... disabling the block cache makes sense on very large scans. It turns out that synchronizing in istream in HFileBlock.readAtOffset is the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7334) We should expire the zk session for crashed servers rather than deleting ephemeral znodes
[ https://issues.apache.org/jira/browse/HBASE-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529844#comment-13529844 ] nkeywal commented on HBASE-7334: Is there a security impact if we keep the password in a file? We can do some dissimulation, but it's will have to be readable, at least by the user account used to start/stop hbase. We should expire the zk session for crashed servers rather than deleting ephemeral znodes - Key: HBASE-7334 URL: https://issues.apache.org/jira/browse/HBASE-7334 Project: HBase Issue Type: Improvement Components: master, regionserver, Zookeeper Affects Versions: 0.96.0 Reporter: Enis Soztutar For faster recovery HBASE-5844 and HBASE-5926 added logic to delete the ephemeral znodes for the master and region server from the hbase-daemon.sh script. However, the master and RSs have other ephemeral nodes that are not cleaned (for example region splitting, table lock) Instead of deleting the main znode, we can just invalidate the zookeeper session by doing smt like HBaseTestingUtility.expireSession(). For this we need to keep the zk.getSessionId() and zk.getSessionPasswd() around(write to a local file), keep the file updated for reconnections, and once we know that the zk session is gone in ZNodeClearer, we can just create a new session with the same credentials, and close that one, effectively causing zk to delete all ephemeral nodes for the session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7205) Coprocessor classloader is replicated for all regions in the HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529882#comment-13529882 ] Hudson commented on HBASE-7205: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #293 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/293/]) HBASE-7205 Coprocessor classloader is replicated for all regions in the HRegionServer (Ted Yu and Adrian Muraru) (Revision 1420480) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorClassLoader.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java Coprocessor classloader is replicated for all regions in the HRegionServer -- Key: HBASE-7205 URL: https://issues.apache.org/jira/browse/HBASE-7205 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.92.2, 0.94.2 Reporter: Adrian Muraru Assignee: Ted Yu Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: 7205-0.94.txt, 7205-v10.txt, 7205-v1.txt, 7205-v3.txt, 7205-v4.txt, 7205-v5.txt, 7205-v6.txt, 7205-v7.txt, 7205-v8.txt, 7205-v9.txt, HBASE-7205_v2.patch HBASE-6308 introduced a new custom CoprocessorClassLoader to load the coprocessor classes and a new instance of this CL is created for each single HRegion opened. This leads to OOME-PermGen when the number of regions go above hundres / region server. Having the table coprocessor jailed in a separate classloader is good however we should create only one for all regions of a table in each HRS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7328) IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove
[ https://issues.apache.org/jira/browse/HBASE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529881#comment-13529881 ] Hudson commented on HBASE-7328: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #293 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/293/]) HBASE-7328 IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove (Revision 1420543) Result = FAILURE stack : Files : * /hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRebalanceAndKillServers.java IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove Key: HBASE-7328 URL: https://issues.apache.org/jira/browse/HBASE-7328 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7328-v0.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5258) Move coprocessors set out of RegionLoad
[ https://issues.apache.org/jira/browse/HBASE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529880#comment-13529880 ] Hudson commented on HBASE-5258: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #293 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/293/]) HBASE-5258 Move coprocessors set out of RegionLoad - Addendum (Sergey) (Revision 1420521) Result = FAILURE tedyu : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Move coprocessors set out of RegionLoad --- Key: HBASE-5258 URL: https://issues.apache.org/jira/browse/HBASE-5258 Project: HBase Issue Type: Task Reporter: Ted Yu Assignee: Sergey Shelukhin Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: HBASE-5258-094.patch, HBASE-5258-fix-on-top-of-v1.patch, HBASE-5258-v0.patch, HBASE-5258-v1.patch When I worked on HBASE-5256, I revisited the code related to Ser/De of coprocessors set in RegionLoad. I think the rationale for embedding coprocessors set is for maximum flexibility where each region can load different coprocessors. This flexibility is causing extra cost in the region server to Master communication and increasing the footprint of Master heap. Would HServerLoad be a better place for this set ? If required, region server should calculate disparity of loaded coprocessors among regions and send report through HServerLoad -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7211) Improve hbase ref guide for the testing part.
[ https://issues.apache.org/jira/browse/HBASE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529986#comment-13529986 ] nkeywal commented on HBASE-7211: I will commit Jeffrey's patch tomorrow if there is no objection. Improve hbase ref guide for the testing part. - Key: HBASE-7211 URL: https://issues.apache.org/jira/browse/HBASE-7211 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Priority: Minor Attachments: hbase-7211-partial.patch Here is some stuff I saw. I will propose a fix in a week or so, please add the comment or issues you have in mind. ??15.6.1. Apache HBase Modules?? = We should be able to use categories in all modules. The default should be small; but any test manipulating the time needs to be in a specific jvm (hence medium), so it's not always related to minicluster. ??15.6.3.6. hbasetests.sh?? = We can remove this chapter, and the script The script is not totally useless, but I think nobody actually uses it. = Add a chapter on flakiness. Some tests are, unfortunately, flaky. While there number decreases, we still have some. Rules are: - don't write flaky tests! :-) - small tests cannot be flaky, as it blocks other test execution. Corollary: if you have an issue with a small test, it's either your environment either a severe issue. - rerun the test a few time to validate, check the ports and file descriptors used. ??mvn test -P localTests -Dtest=MyTest?? = We could actually activate the localTests profile whenever -Dtest is used. If we do that, we can remove the reference from localTests in the doc. ??mvn test -P runSmallTests?? ??mvn test -P runMediumTests?? = I'm not sure it's actually used. We could remove them from the pom.xml (and the doc). ??The HBase build uses a patched version of the maven surefire plugin?? = Hopefully, we will be able to remove this soon :-) ??Integration tests are described TODO: POINTER_TO_INTEGRATION_TEST_SECTION?? = Should be documented -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7313) ColumnPaginationFilter should reset count when moving to NEXT_ROW
[ https://issues.apache.org/jira/browse/HBASE-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530005#comment-13530005 ] Hadoop QA commented on HBASE-7313: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560213/7313-trunk.txt against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 104 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.filter.TestFilter Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3493//console This message is automatically generated. ColumnPaginationFilter should reset count when moving to NEXT_ROW - Key: HBASE-7313 URL: https://issues.apache.org/jira/browse/HBASE-7313 Project: HBase Issue Type: Bug Components: Filters Affects Versions: 0.94.3, 0.96.0 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.96.0, 0.94.4 Attachments: 7313-0.94.txt, 7313-trunk.txt ColumnPaginationFilter does not reset count to zero on moving to next row. Hence, if we have already gotten limit number of columns - the subsequent rows will always return 0 columns. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7315) Remove support for client-side RowLocks
[ https://issues.apache.org/jira/browse/HBASE-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530027#comment-13530027 ] Hadoop QA commented on HBASE-7315: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560447/HBASE-7315-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 105 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 21 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3492//console This message is automatically generated. Remove support for client-side RowLocks --- Key: HBASE-7315 URL: https://issues.apache.org/jira/browse/HBASE-7315 Project: HBase Issue Type: Sub-task Components: Transactions/MVCC Reporter: Gregory Chanan Assignee: Gregory Chanan Fix For: 0.96.0 Attachments: HBASE-7315.patch, HBASE-7315-v2.patch See comments in HBASE-7263. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition
[ https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530062#comment-13530062 ] ramkrishna.s.vasudevan commented on HBASE-7335: --- @Kyle Could you attach the logs during the time of split? Is it possible ? Failed split can cause a region to get stuck in transition -- Key: HBASE-7335 URL: https://issues.apache.org/jira/browse/HBASE-7335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.1 Reporter: Kyle McGovern Trying to reassign a region after a failed split causes a that region to get stuck in transition. hdfs dfs -R output http://pastebin.com/F4DgTxj1 hbck output http://pastebin.com/BaftESBd error on regionserver http://pastebin.com/Mye60rUA For example, if I remove /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads
[ https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530074#comment-13530074 ] Lars Hofhansl commented on HBASE-7336: -- TestMultiParallel passed locally. HFileBlock.readAtOffset does not work well with multiple threads Key: HBASE-7336 URL: https://issues.apache.org/jira/browse/HBASE-7336 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: 7336-0.94.txt, 7336-0.96.txt HBase grinds to a halt when many threads scan along the same set of blocks and neither read short circuit is nor block caching is enabled for the dfs client ... disabling the block cache makes sense on very large scans. It turns out that synchronizing in istream in HFileBlock.readAtOffset is the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7205) Coprocessor classloader is replicated for all regions in the HRegionServer
[ https://issues.apache.org/jira/browse/HBASE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530088#comment-13530088 ] Adrian Muraru commented on HBASE-7205: -- Lars you're right, apparently there is one thread keeping a strong reference to our custom classloader. The thing is that this seems to be a junit thread, when I'm testing manually with HBase standalone by enabling/disabling a multi-region table I can see these instances GC'ed. Not 100% sure but I suspect the junit is doing some sort of class loading accounting - for reporting purposes or so and keeps these references Coprocessor classloader is replicated for all regions in the HRegionServer -- Key: HBASE-7205 URL: https://issues.apache.org/jira/browse/HBASE-7205 Project: HBase Issue Type: Bug Components: Coprocessors Affects Versions: 0.92.2, 0.94.2 Reporter: Adrian Muraru Assignee: Ted Yu Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: 7205-0.94.txt, 7205-v10.txt, 7205-v1.txt, 7205-v3.txt, 7205-v4.txt, 7205-v5.txt, 7205-v6.txt, 7205-v7.txt, 7205-v8.txt, 7205-v9.txt, HBASE-7205_v2.patch HBASE-6308 introduced a new custom CoprocessorClassLoader to load the coprocessor classes and a new instance of this CL is created for each single HRegion opened. This leads to OOME-PermGen when the number of regions go above hundres / region server. Having the table coprocessor jailed in a separate classloader is good however we should create only one for all regions of a table in each HRS. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7326) SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations
[ https://issues.apache.org/jira/browse/HBASE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530091#comment-13530091 ] Gary Helmling commented on HBASE-7326: -- .bq Could we get rid of SortedCopyOnWriteSet Gary for CSLS? That's the idea. Given the lack of locking for CSLS, it may be just as low overhead for iteration and would actually be fully thread safe. In which case, let's dump SortedCopyOnWriteSet if it doesn't buy us anything. SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations - Key: HBASE-7326 URL: https://issues.apache.org/jira/browse/HBASE-7326 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.2, 0.94.3, 0.96.0 Reporter: Gary Helmling The SortedCopyOnWriteSet implementation uses an internal TreeSet that is copied and replaced on mutation operations. However, in a few areas, SortedCopyOnWriteSet leaks references to the underlying TreeSet implementations, allowing for unsafe usage: * iterator() * subSet() * headSet() * tailSet() For Iterator.remove(), we can wrap in an implementation that throws UnsupportedOperationException. For the sub set methods, we could return new SortedCopyOnWriteSet instances (which would not modify the parent set), or wrap with a new sub set implementation that safely allows modification of the parent set. To be clear, the current usage of SortedCopyOnWriteSet does not make use of any of these non-thread-safe methods, but the implementation should be fixed to be completely thread safe and prevent any new issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530123#comment-13530123 ] Hiroshi Ikeda commented on HBASE-7295: -- It is meaningless to chage the final instance variable PoolMap to volatile, because its effects around ensuring visbility between threads are applied when you get/set the reference itself. Also PoolMap is not thread safe indeed, and we don't tell what happens from the beginning (HBASE-6651). Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.96.0, 0.94.4 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530128#comment-13530128 ] Lars Hofhansl commented on HBASE-7295: -- Indeed. You're right of course. Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.96.0, 0.94.4 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530139#comment-13530139 ] Andrew Purtell commented on HBASE-7317: --- {quote} trunk pom already specifies htrace: htrace.version1.49/htrace.version {quote} Should we be depending on something that only has a single contributor and hasn't seen a commit in over three months? server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection
[ https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530138#comment-13530138 ] Lars Hofhansl commented on HBASE-7295: -- In fact I had misread the whole patch (looked to me like we're checking and rechecking connections, but we're checking the connection we're retrieving from connections, hence Ted's comment about making that volatile). Contention in HBaseClient.getConnection --- Key: HBASE-7295 URL: https://issues.apache.org/jira/browse/HBASE-7295 Project: HBase Issue Type: Improvement Affects Versions: 0.94.3 Reporter: Varun Sharma Assignee: Varun Sharma Fix For: 0.96.0, 0.94.4 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt HBaseClient.getConnection() synchronizes on the connections object. We found severe contention on a thrift gateway which was fanning out roughly 3000+ calls per second to hbase region servers. The thrift gateway had 2000+ threads for handling incoming connections. Threads were blocked on the syncrhonized block - we set ipc.pool.size to 200. Since we are using RoundRobin/ThreadLocal pool only - its not necessary to synchronize on connections - it might lead to cases where we might go slightly over the ipc.max.pool.size() but the additional connections would timeout after maxIdleTime - underlying PoolMap connections object is thread safe. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530145#comment-13530145 ] stack commented on HBASE-7317: -- bq. Should we be depending on something that only has a single contributor and hasn't seen a commit in over three months? Fair point. Hope was that we'd add tracing to hbase w/ this as a start (and that hadoop itself would be adding trace I suppose so we could go down into datanodes). If no progress on tracing before, say 0.96, yeah, lets remove it. But maybe there will be progress made in this issue. Regards a central collector for traces, could try writing an hbase table. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530153#comment-13530153 ] Andrew Purtell commented on HBASE-7317: --- bq. Fair point We could also go in the other direction, reach out to Jon for a grant to port to Apache as Sergey said, and then carry it forward maintained in tree. It would need a sponsor, and work to make it useful along the lines that Todd and you suggest. Do we have that is the question. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7328) IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove
[ https://issues.apache.org/jira/browse/HBASE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530163#comment-13530163 ] Sergey Shelukhin commented on HBASE-7328: - Thanks! IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove Key: HBASE-7328 URL: https://issues.apache.org/jira/browse/HBASE-7328 Project: HBase Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Trivial Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7328-v0.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition
[ https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530173#comment-13530173 ] Kyle McGovern commented on HBASE-7335: -- [~ram_krish] I'm not sure exactly when the split failed so finding the logs might be difficult. Is there any string in particular I might be able to search for? Failed split can cause a region to get stuck in transition -- Key: HBASE-7335 URL: https://issues.apache.org/jira/browse/HBASE-7335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.1 Reporter: Kyle McGovern Trying to reassign a region after a failed split causes a that region to get stuck in transition. hdfs dfs -R output http://pastebin.com/F4DgTxj1 hbck output http://pastebin.com/BaftESBd error on regionserver http://pastebin.com/Mye60rUA For example, if I remove /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530177#comment-13530177 ] Todd Lipcon commented on HBASE-7317: The license is already Apache, so if someone wants to make changes and send a pull request, I'm happy to pull them in and publish a new version of htrace. I don't think we need substantial changes to htrace itself - more work is remaining in the trace collection / viewing area. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530180#comment-13530180 ] Andrew Purtell commented on HBASE-7317: --- bq. The license is already Apache, so if someone wants to make changes and send a pull request, I'm happy to pull them in and publish a new version of htrace. Any chance of getting spans into HDFS with the current project hosting? server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7334) We should expire the zk session for crashed servers rather than deleting ephemeral znodes
[ https://issues.apache.org/jira/browse/HBASE-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530193#comment-13530193 ] Enis Soztutar commented on HBASE-7334: -- bq. Is there a security impact if we keep the password in a file? We can do some dissimulation, but it's will have to be readable, at least by the user account used to start/stop hbase. Good question. If we make that file only readable by the hbase user it should be fine I think, since he has access to the credentials anyway. We should expire the zk session for crashed servers rather than deleting ephemeral znodes - Key: HBASE-7334 URL: https://issues.apache.org/jira/browse/HBASE-7334 Project: HBase Issue Type: Improvement Components: master, regionserver, Zookeeper Affects Versions: 0.96.0 Reporter: Enis Soztutar For faster recovery HBASE-5844 and HBASE-5926 added logic to delete the ephemeral znodes for the master and region server from the hbase-daemon.sh script. However, the master and RSs have other ephemeral nodes that are not cleaned (for example region splitting, table lock) Instead of deleting the main znode, we can just invalidate the zookeeper session by doing smt like HBaseTestingUtility.expireSession(). For this we need to keep the zk.getSessionId() and zk.getSessionPasswd() around(write to a local file), keep the file updated for reconnections, and once we know that the zk session is gone in ZNodeClearer, we can just create a new session with the same credentials, and close that one, effectively causing zk to delete all ephemeral nodes for the session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7305) ZK based Read/Write locks for table operations
[ https://issues.apache.org/jira/browse/HBASE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530197#comment-13530197 ] Sergey Shelukhin commented on HBASE-7305: - After cursory look at the patch, I have two questions... 1) I think I saw an article about standard zk primitives library, and iirc even discussed it with Enis. Is it the curator library meant above? If so we should probably switch to it. Especially if it's easy to modify :) 2) More broadly, I wonder about the scalability impact of this. At the minimum, locks need to be write-preference to prevent region servers on large clusters from starving the clients and master forever (separate problem is what to do with stray region servers stuck with lock (ZK will take care of that?), but many servers can starve master/clients by sheer force of numbers). ZK based Read/Write locks for table operations -- Key: HBASE-7305 URL: https://issues.apache.org/jira/browse/HBASE-7305 Project: HBase Issue Type: Bug Components: Client, master, Zookeeper Reporter: Enis Soztutar Assignee: Enis Soztutar Fix For: 0.96.0 Attachments: hbase-7305_v0.patch This has started as forward porting of HBASE-5494 and HBASE-5991 from the 89-fb branch to trunk, but diverged enough to have it's own issue. The idea is to implement a zk based read/write lock per table. Master initiated operations should get the write lock, and region operations (region split, moving, balance?, etc) acquire a shared read lock. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530203#comment-13530203 ] Sergey Shelukhin commented on HBASE-7317: - Hmm, somehow I missed that in the book. That looks very useful :) I have looked at the source a bit; is there any good way to add debug information to Span-s, e.g. exceptions/etc.? As far as I understand it currently would trace operation starts/ends, right? From the patch in the JIRA that adds the hooks, it looks like more hooks should be added. Wrt placement, is there a reason to not put it into org.apache.common..., with only HDFS/HBase/etc. specific receivers living in their corresponding projects? I can do it when I have bandwidth if there are no legal/procedural objections or objections from the author. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530204#comment-13530204 ] Andrew Purtell commented on HBASE-7317: --- bq. Wrt placement, is there a reason to not put it into org.apache.common..., with only HDFS/HBase/etc. specific receivers living in their corresponding projects? I can do it when I have bandwidth if there are no legal/procedural objections or objections from the author. +1 to this server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530206#comment-13530206 ] Sergey Shelukhin commented on HBASE-7243: - +1 Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530209#comment-13530209 ] Sergey Shelukhin commented on HBASE-7268: - ping? Thanks. Do we want to consider supplying open timestamp from the master too? correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7326) SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations
[ https://issues.apache.org/jira/browse/HBASE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530230#comment-13530230 ] Ted Yu commented on HBASE-7326: --- +1 on dropping SortedCopyOnWriteSet SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations - Key: HBASE-7326 URL: https://issues.apache.org/jira/browse/HBASE-7326 Project: HBase Issue Type: Bug Components: util Affects Versions: 0.92.2, 0.94.3, 0.96.0 Reporter: Gary Helmling The SortedCopyOnWriteSet implementation uses an internal TreeSet that is copied and replaced on mutation operations. However, in a few areas, SortedCopyOnWriteSet leaks references to the underlying TreeSet implementations, allowing for unsafe usage: * iterator() * subSet() * headSet() * tailSet() For Iterator.remove(), we can wrap in an implementation that throws UnsupportedOperationException. For the sub set methods, we could return new SortedCopyOnWriteSet instances (which would not modify the parent set), or wrap with a new sub set implementation that safely allows modification of the parent set. To be clear, the current usage of SortedCopyOnWriteSet does not make use of any of these non-thread-safe methods, but the implementation should be fixed to be completely thread safe and prevent any new issues. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
Himanshu Vashishtha created HBASE-7338: -- Summary: Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange Key: HBASE-7338 URL: https://issues.apache.org/jira/browse/HBASE-7338 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3, 0.96.0 Reporter: Himanshu Vashishtha Priority: Minor The balancer doesn't run in case a region is in-transition. The check to confirm whether there all regions are assigned looks for region count 22, where the total regions are 27. This may result in a failure: {code} java.lang.AssertionError: After 5 attempts, region assignments were not balanced. at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203) at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123) . 2012-12-11 13:47:02,231 INFO [pool-1-thread-1] hbase.TestRegionRebalancing(120): Added fourth server=p0118.mtv.cloudera.com,44414,1355262422083 2012-12-11 13:47:02,231 INFO [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] regionserver.HRegionServer(3769): Registered RegionServer MXBean 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d. state=OPENING, ts=1355262421037, server=p0118.mtv.cloudera.com,54281,1355262419765} 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load Average: 13.0 low border: 9, up border: 16; attempt: 0 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 Avg: 13.0 actual: 11 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 Avg: 13.0 actual: 15 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,48044,1355262421787 Avg: 13.0 actual: 0 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(179): p0118.mtv.cloudera.com,48044,1355262421787 Isn't balanced!!! Avg: 13.0 actual: 0 slop: 0.2 2012-12-11 13:47:12,233 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
[ https://issues.apache.org/jira/browse/HBASE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7338: --- Attachment: HBASE-7338.patch Ran the test locally and it passes. Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange - Key: HBASE-7338 URL: https://issues.apache.org/jira/browse/HBASE-7338 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3, 0.96.0 Reporter: Himanshu Vashishtha Priority: Minor Attachments: HBASE-7338.patch The balancer doesn't run in case a region is in-transition. The check to confirm whether there all regions are assigned looks for region count 22, where the total regions are 27. This may result in a failure: {code} java.lang.AssertionError: After 5 attempts, region assignments were not balanced. at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203) at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123) . 2012-12-11 13:47:02,231 INFO [pool-1-thread-1] hbase.TestRegionRebalancing(120): Added fourth server=p0118.mtv.cloudera.com,44414,1355262422083 2012-12-11 13:47:02,231 INFO [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] regionserver.HRegionServer(3769): Registered RegionServer MXBean 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d. state=OPENING, ts=1355262421037, server=p0118.mtv.cloudera.com,54281,1355262419765} 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load Average: 13.0 low border: 9, up border: 16; attempt: 0 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 Avg: 13.0 actual: 11 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 Avg: 13.0 actual: 15 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,48044,1355262421787 Avg: 13.0 actual: 0 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(179): p0118.mtv.cloudera.com,48044,1355262421787 Isn't balanced!!! Avg: 13.0 actual: 0 slop: 0.2 2012-12-11 13:47:12,233 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
[ https://issues.apache.org/jira/browse/HBASE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7338: --- Status: Patch Available (was: Open) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange - Key: HBASE-7338 URL: https://issues.apache.org/jira/browse/HBASE-7338 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3, 0.96.0 Reporter: Himanshu Vashishtha Priority: Minor Attachments: HBASE-7338.patch The balancer doesn't run in case a region is in-transition. The check to confirm whether there all regions are assigned looks for region count 22, where the total regions are 27. This may result in a failure: {code} java.lang.AssertionError: After 5 attempts, region assignments were not balanced. at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203) at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123) . 2012-12-11 13:47:02,231 INFO [pool-1-thread-1] hbase.TestRegionRebalancing(120): Added fourth server=p0118.mtv.cloudera.com,44414,1355262422083 2012-12-11 13:47:02,231 INFO [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] regionserver.HRegionServer(3769): Registered RegionServer MXBean 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d. state=OPENING, ts=1355262421037, server=p0118.mtv.cloudera.com,54281,1355262419765} 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load Average: 13.0 low border: 9, up border: 16; attempt: 0 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 Avg: 13.0 actual: 11 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 Avg: 13.0 actual: 15 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,48044,1355262421787 Avg: 13.0 actual: 0 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(179): p0118.mtv.cloudera.com,48044,1355262421787 Isn't balanced!!! Avg: 13.0 actual: 0 slop: 0.2 2012-12-11 13:47:12,233 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
Jonathan Hsieh created HBASE-7339: - Summary: Splitting a hfilelink causes region servers to go down. Key: HBASE-7339 URL: https://issues.apache.org/jira/browse/HBASE-7339 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: hbase-6055 Steps: - Have a single region table 15 hfiles in it. - Snapshot it. - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed. - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. hfile-region-table.parentregion - these split hfile links are interpreted as hfile links with table table.parentregion - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530264#comment-13530264 ] Enis Soztutar commented on HBASE-7243: -- Integration test class name should start with IntegrationTest, can you rename it: http://hbase.apache.org/book/hbase.tests.html#integration.tests Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
[ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530267#comment-13530267 ] Jonathan Hsieh commented on HBASE-7339: --- This was encountered when testing online snapshots, but will affect offline snapshots as well. Suggested solutions: 1) Make opening the hfile-link daughter reference more robust, by attempting to treat as a reference if treating as link fails. Hacky but should work. 2) Change the regex's used to differentiate references and hfilelinks more strict so that we can differentiate. Hacky, not sure if it will work. 3) Change daughter reference link file name to be more robust. Currently 'hfile.parentregion', maybe chanage to 'hfile@parentregion'. This would then allow 'hfile-region-table@parentreigon' to be interpreted correctly. This is the right way but breaks compatibility Other follow-ons -- ideally we are more robust by quarantining a bad region or hfiles/linksfiles if it has killed a few nodes in the cluster. Splitting a hfilelink causes region servers to go down. --- Key: HBASE-7339 URL: https://issues.apache.org/jira/browse/HBASE-7339 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: hbase-6055 Steps: - Have a single region table 15 hfiles in it. - Snapshot it. - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed. - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. hfile-region-table.parentregion - these split hfile links are interpreted as hfile links with table table.parentregion - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
[ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-7339: -- Description: Steps: - Have a single region table with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. was: Steps: - Have a single region table 15 hfiles in it. - Snapshot it. - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed. - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. hfile-region-table.parentregion - these split hfile links are interpreted as hfile links with table table.parentregion - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. Splitting a hfilelink causes region servers to go down. --- Key: HBASE-7339 URL: https://issues.apache.org/jira/browse/HBASE-7339 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: hbase-6055 Steps: - Have a single region table with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530273#comment-13530273 ] Todd Lipcon commented on HBASE-7317: We can't put it in org.apache.* unless it's an Apache project. If you want to submit it to the incubator as a project I would be interested in joining up, but our thinking at the time of development was that it's a small enough piece of code that it would be easier to just develop on github until it got traction in a bunch of projects. There's no restriction that Apache projects only depend on other Apache projects - eg we depend on Google libraries like protobuf and guava. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
[ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530274#comment-13530274 ] Jonathan Hsieh commented on HBASE-7339: --- I'm going to pursue #1 and then #2 first. Splitting a hfilelink causes region servers to go down. --- Key: HBASE-7339 URL: https://issues.apache.org/jira/browse/HBASE-7339 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: hbase-6055 Steps: - Have a single region table with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
[ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530267#comment-13530267 ] Jonathan Hsieh edited comment on HBASE-7339 at 12/12/12 8:06 PM: - This was encountered when testing online snapshots, but will affect offline snapshots as well. Suggested solutions: 1) Make opening the hfile-link daughter reference more robust, by attempting to treat as a reference if treating as link fails. Hacky but should work. 2) Change the regex's used to differentiate references and hfilelinks more strict so that we can differentiate. Hacky, not sure if it will work. 3) Change daughter reference link file name to be more robust. Currently 'hfile.parentregion', maybe chanage to 'hfile@parentregion'. This would then allow 'hfile\-region\-table@parentreigon' to be interpreted correctly. This is the right way but breaks compatibility Other follow-ons -- ideally we are more robust by quarantining a bad region or hfiles/linksfiles if it has killed a few nodes in the cluster. was (Author: jmhsieh): This was encountered when testing online snapshots, but will affect offline snapshots as well. Suggested solutions: 1) Make opening the hfile-link daughter reference more robust, by attempting to treat as a reference if treating as link fails. Hacky but should work. 2) Change the regex's used to differentiate references and hfilelinks more strict so that we can differentiate. Hacky, not sure if it will work. 3) Change daughter reference link file name to be more robust. Currently 'hfile.parentregion', maybe chanage to 'hfile@parentregion'. This would then allow 'hfile-region-table@parentreigon' to be interpreted correctly. This is the right way but breaks compatibility Other follow-ons -- ideally we are more robust by quarantining a bad region or hfiles/linksfiles if it has killed a few nodes in the cluster. Splitting a hfilelink causes region servers to go down. --- Key: HBASE-7339 URL: https://issues.apache.org/jira/browse/HBASE-7339 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: hbase-6055 Steps: - Have a single region table with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530275#comment-13530275 ] Andrew Purtell commented on HBASE-7317: --- I'm pretty sure the thinking is a grant of this code to the Apache Hadoop project, not the formation of a full fledged project. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530277#comment-13530277 ] Enis Soztutar commented on HBASE-7243: -- Also, can you interrupt the Worker thread on timeout? Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530278#comment-13530278 ] Hadoop QA commented on HBASE-7268: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560337/HBASE-7268-v2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 104 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 22 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3495//console This message is automatically generated. correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7339) Splitting a hfilelink causes region servers to go down.
[ https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Hsieh updated HBASE-7339: -- Description: Steps: - Have a single region table t with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot to table t'. - t' has its region do a post-open task that attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - t' starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. was: Steps: - Have a single region table with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot - region post-open task attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - it starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. Splitting a hfilelink causes region servers to go down. --- Key: HBASE-7339 URL: https://issues.apache.org/jira/browse/HBASE-7339 Project: HBase Issue Type: Sub-task Components: snapshots Affects Versions: hbase-6055 Reporter: Jonathan Hsieh Assignee: Jonathan Hsieh Priority: Blocker Fix For: hbase-6055 Steps: - Have a single region table t with 15 hfiles in it. - Snapshot it. (was done using online snapshot from HBASE-7321) - Clone a snapshot to table t'. - t' has its region do a post-open task that attempts to compact region. policy does not compact all files. (default seems to be 10) - after compaction we have hfile links and real hfiles mixed in the region - t' starts splitting - creating split references, opening daughers fails - hfile links are split, creating hfile link daughter refs. {{hfile\-region\-table.parentregion}} - these split hfile links are interpreted as hfile links with table {{table.parentregion}} - {{hfile\-region\-table.parentregion}} (groupings interpreted incorrectly) - Since this is after the splitting PONR, this aborts the server. It then spreads to the next server. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7340) Allow user-specified actions following region movement
Ted Yu created HBASE-7340: - Summary: Allow user-specified actions following region movement Key: HBASE-7340 URL: https://issues.apache.org/jira/browse/HBASE-7340 Project: HBase Issue Type: Bug Reporter: Ted Yu Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7327) Assignment Timeouts: Remove the code from the master
[ https://issues.apache.org/jira/browse/HBASE-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530313#comment-13530313 ] nkeywal commented on HBASE-7327: I've got some doubts on TestMasterFailover. The way the code is written on a master failover is to look for what is in zk, and, if the regionserver is down, force a reassign, if not, put it in the RIT list. Many tests in TestMasterFailover put a given state in ZK, but keep the regionserver up. This way, it's actually the timeout that is managing the region status. It's fast because the timeout is set to a few seconds. But we should have a test with a real failover, with standard cases, and they should be fast without setting a timeout to 2 seconds or so. So: - this test shows a specific usage of the timeout: being a garbage collector when we put ourselves in an unexpected situation - doesn't prove that we're effectively recovering quickly when we have a master failover, because the very short timeout hides the problem. As an example, it seems that if the master fails just after creating a offline znode (before contacting the region server), we need the timeout to recover the region (i.e. 10 minutes). If confirmed (I will recheck tomorrow), it would be a bug (not that simple to fix actually), but we don't see it because of this short timeout. And so, I'm thinking about: - refactoring the tests to express the tests that can occurs during a master failover (including a region server crash, but may be it does exist already) - keeping the timeout, but as a security only, without doing anything if it's allocated to a live region server. May be we will need extra cases here, I need to study the code more. - May be add extra code if we identify a region opening for too long on a live server: calling it to check its status, release it or something alike. To be discussed :-) Assignment Timeouts: Remove the code from the master Key: HBASE-7327 URL: https://issues.apache.org/jira/browse/HBASE-7327 Project: HBase Issue Type: Improvement Components: master Affects Versions: 0.96.0 Reporter: nkeywal Assignee: nkeywal Attachments: 7327.v1.uncomplete.patch As per HBASE-7247... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7236) add per-table/per-cf configuration via metadata
[ https://issues.apache.org/jira/browse/HBASE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530315#comment-13530315 ] Hadoop QA commented on HBASE-7236: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560500/HBASE-7236-v1.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 25 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 105 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestShell org.apache.hadoop.hbase.TestDrainingServer org.apache.hadoop.hbase.client.TestMultiParallel Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3496//console This message is automatically generated. add per-table/per-cf configuration via metadata --- Key: HBASE-7236 URL: https://issues.apache.org/jira/browse/HBASE-7236 Project: HBase Issue Type: New Feature Components: Compaction Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE-v1.patch, HBASE-7236-v0.patch, HBASE-7236-v1.patch Regardless of the compaction policy, it makes sense to have separate configuration for compactions for different tables and column families, as their access patterns and workloads can be different. In particular, for tiered compactions that are being ported from 0.89-fb branch it is necessary to have, to use it properly. We might want to add support for compaction configuration via metadata on table/cf. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7340) Allow user-specified actions following region movement
[ https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Otis Gospodnetic updated HBASE-7340: Description: Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information: http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+ was: Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information Allow user-specified actions following region movement -- Key: HBASE-7340 URL: https://issues.apache.org/jira/browse/HBASE-7340 Project: HBase Issue Type: Bug Reporter: Ted Yu Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information: http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
[ https://issues.apache.org/jira/browse/HBASE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530324#comment-13530324 ] Hadoop QA commented on HBASE-7338: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12560621/HBASE-7338.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 104 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 23 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestMasterMetrics Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3497//console This message is automatically generated. Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange - Key: HBASE-7338 URL: https://issues.apache.org/jira/browse/HBASE-7338 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3, 0.96.0 Reporter: Himanshu Vashishtha Priority: Minor Attachments: HBASE-7338.patch The balancer doesn't run in case a region is in-transition. The check to confirm whether there all regions are assigned looks for region count 22, where the total regions are 27. This may result in a failure: {code} java.lang.AssertionError: After 5 attempts, region assignments were not balanced. at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203) at org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123) . 2012-12-11 13:47:02,231 INFO [pool-1-thread-1] hbase.TestRegionRebalancing(120): Added fourth server=p0118.mtv.cloudera.com,44414,1355262422083 2012-12-11 13:47:02,231 INFO [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] regionserver.HRegionServer(3769): Registered RegionServer MXBean 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not running balancer because 1 region(s) in transition: {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d. state=OPENING, ts=1355262421037, server=p0118.mtv.cloudera.com,54281,1355262419765} 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load Average: 13.0 low border: 9, up border: 16; attempt: 0 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 Avg: 13.0 actual: 11 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 Avg: 13.0 actual: 15 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] hbase.TestRegionRebalancing(171):
[jira] [Created] (HBASE-7341) Deprecate RowLocks in 0.94
Gregory Chanan created HBASE-7341: - Summary: Deprecate RowLocks in 0.94 Key: HBASE-7341 URL: https://issues.apache.org/jira/browse/HBASE-7341 Project: HBase Issue Type: Task Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.94.4 Since we are removing support in 0.96 (see HBASE-7315), we should deprecate in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HBASE-7022) Use multi to batch offline regions in zookeeper
[ https://issues.apache.org/jira/browse/HBASE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jimmy Xiang resolved HBASE-7022. Resolution: Won't Fix Assignee: Jimmy Xiang Patched ZooKeeper with async multi support. Tried to use it to batch offline regions, but didn't get much performance gain as expected. Use multi to batch offline regions in zookeeper --- Key: HBASE-7022 URL: https://issues.apache.org/jira/browse/HBASE-7022 Project: HBase Issue Type: Bug Reporter: Jimmy Xiang Assignee: Jimmy Xiang Bulk assigner needs to set regions offline in zookeeper one by one. I was wondering if we can have some performance improvement if we batch these operations using ZooKeeper#multi. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
Aleksandr Shulman created HBASE-7342: Summary: Split operation without split key incorrectly finds the middle key in off-by-one error Key: HBASE-7342 URL: https://issues.apache.org/jira/browse/HBASE-7342 Project: HBase Issue Type: Bug Components: HFile, io Affects Versions: 0.94.3, 0.94.2, 0.94.1, 0.96.0, 0.94.4 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.0, 0.94.2 I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting). The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added. Looking into the code here is what goes on: 1. Split takes the largest storefile 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i] 3. Getting the middle root-level index should yield the key in the middle of the storefile 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing. 5. In a result with where there are only 2 blockKeys, this yields the 0th block key. 6. Unfortunately, this is the same block key that 'firstKey' will be. 7. This yields the result in HStore.java:1873 (cannot split because midkey is the same as first or last row) 8. Removing the -1 solves the problem (in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530403#comment-13530403 ] Todd Lipcon commented on HBASE-7317: I wouldn't want to put it in Hadoop common -- then we'd have to do elaborate stubbing in our compat code in order to use it while still supporting older versions. It is also useful for non-Hadoop projects (eg something like Cassandra) server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.
[ https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530404#comment-13530404 ] Andrew Purtell commented on HBASE-7331: --- All tests pass for me locally except for TestHBaseFsck#testRegionShouldNotBeDeployed, which seems an unrelated failure. Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server. -- Key: HBASE-7331 URL: https://issues.apache.org/jira/browse/HBASE-7331 Project: HBase Issue Type: Sub-task Components: regionserver, security Affects Versions: 0.94.3, 0.96.0 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.94.3, 0.96.0 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk.patch The following APIs in HRegionServer are either missing hooks to coprocessor or the hooks are not implemented in the AccessController class for security. As a result any unauthorized user can: 1.Open a region 2. Close a region 3. Stop region server 4. Lock a row 5. Unlock a row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition
[ https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530405#comment-13530405 ] Jimmy Xiang commented on HBASE-7335: This region should be a daughter region. The region split should be succeeded. It looks to me the parent region is removed while there are still daughter regions refer to it. Since the parent region is gone, we got no choice and have to remove the reference file. Do you have any data loss? Failed split can cause a region to get stuck in transition -- Key: HBASE-7335 URL: https://issues.apache.org/jira/browse/HBASE-7335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.1 Reporter: Kyle McGovern Trying to reassign a region after a failed split causes a that region to get stuck in transition. hdfs dfs -R output http://pastebin.com/F4DgTxj1 hbck output http://pastebin.com/BaftESBd error on regionserver http://pastebin.com/Mye60rUA For example, if I remove /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530410#comment-13530410 ] Andrew Purtell commented on HBASE-7317: --- {quote} Todd: I wouldn't want to put it in Hadoop common – then we'd have to do elaborate stubbing in our compat code in order to use it while still supporting older versions. It is also useful for non-Hadoop projects (eg something like Cassandra) {quote} That's disappointing. Then my concern about depending on a project in this state stands. {quote} Stack: Hope was that we'd add tracing to hbase w/ this as a start (and that hadoop itself would be adding trace I suppose so we could go down into datanodes). If no progress on tracing before, say 0.96, yeah, lets remove it. But maybe there will be progress made in this issue. {quote} Perhaps, otherwise +1 for removing it for 0.96. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
[ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-7342: -- Affects Version/s: (was: 0.94.4) Fix Version/s: (was: 0.94.2) 0.94.4 Split operation without split key incorrectly finds the middle key in off-by-one error -- Key: HBASE-7342 URL: https://issues.apache.org/jira/browse/HBASE-7342 Project: HBase Issue Type: Bug Components: HFile, io Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.0, 0.94.4 I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting). The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added. Looking into the code here is what goes on: 1. Split takes the largest storefile 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i] 3. Getting the middle root-level index should yield the key in the middle of the storefile 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing. 5. In a result with where there are only 2 blockKeys, this yields the 0th block key. 6. Unfortunately, this is the same block key that 'firstKey' will be. 7. This yields the result in HStore.java:1873 (cannot split because midkey is the same as first or last row) 8. Removing the -1 solves the problem (in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530424#comment-13530424 ] Todd Lipcon commented on HBASE-7317: bq. That's disappointing. Then my concern about depending on a project in this state stands. What do you mean? If there are bugs in the code, feel free to submit patches, and I'm happy to integrate them (I have commit access to the repo). If we end up with several contributors, I don't foresee any issues proposing it for Apache incubation. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug
[ https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530429#comment-13530429 ] Andrew Purtell commented on HBASE-7317: --- {quote} If there are bugs in the code, feel free to submit patches, and I'm happy to integrate them (I have commit access to the repo). If we end up with several contributors, I don't foresee any issues proposing it for Apache incubation. {quote} If there's progress on tracing, and certainly if this happens, then I won't be concerned, yes. server-side request problems are hard to debug -- Key: HBASE-7317 URL: https://issues.apache.org/jira/browse/HBASE-7317 Project: HBase Issue Type: Brainstorming Components: IPC/RPC, regionserver Reporter: Sergey Shelukhin Priority: Minor I've seen cases during integration tests where the write or read request took an unexpectedly large amount of time (that, after the client went to the region server that is reported alive and well, which I know from temporary debug logging :)), and it's impossible to understand what is going on on the server side, short of catching the moment with jstack. Some solutions (off by default) could be - a facility for tests (especially integration tests) that would trace Server/Master calls into some log or file (won't help with internals but at least one could see what was actually received); - logging the progress of requests between components inside master/server (e.g. request id=N received, request id=N is being processed in MyClass, N being drawn on client from local sequence - no guarantees of uniqueness are necessary). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7233: - Attachment: 7233v5_encoders.txt Move stuff around per your review [~mcorgan]. I removed Encoder and Decoder. They add little. Yeah, it means IOException but most of the time thats what we'll be throwing at its base when encoding/decoding. I think we need to rename CellScanner to CellInputStream and change the method name from next to read, especially when you look at this patch. What you think Matt? Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
[ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksandr Shulman updated HBASE-7342: - Attachment: HBASE-7342-v1.patch Split operation without split key incorrectly finds the middle key in off-by-one error -- Key: HBASE-7342 URL: https://issues.apache.org/jira/browse/HBASE-7342 Project: HBase Issue Type: Bug Components: HFile, io Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7342-v1.patch I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting). The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added. Looking into the code here is what goes on: 1. Split takes the largest storefile 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i] 3. Getting the middle root-level index should yield the key in the middle of the storefile 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing. 5. In a result with where there are only 2 blockKeys, this yields the 0th block key. 6. Unfortunately, this is the same block key that 'firstKey' will be. 7. This yields the result in HStore.java:1873 (cannot split because midkey is the same as first or last row) 8. Removing the -1 solves the problem (in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7236) add per-table/per-cf configuration via metadata
[ https://issues.apache.org/jira/browse/HBASE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HBASE-7236: Attachment: HBASE-7236-v2.patch Fix TestShell. TestMultiParallel and TestDrainingServer are flaky and pass on local. add per-table/per-cf configuration via metadata --- Key: HBASE-7236 URL: https://issues.apache.org/jira/browse/HBASE-7236 Project: HBase Issue Type: New Feature Components: Compaction Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE-v1.patch, HBASE-7236-v0.patch, HBASE-7236-v1.patch, HBASE-7236-v2.patch Regardless of the compaction policy, it makes sense to have separate configuration for compactions for different tables and column families, as their access patterns and workloads can be different. In particular, for tiered compactions that are being ported from 0.89-fb branch it is necessary to have, to use it properly. We might want to add support for compaction configuration via metadata on table/cf. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-7243: Attachment: 7243-integration-test-many-splits.diff Done and done. Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-7243: Status: Open (was: Patch Available) Canceling request for first patch, which hasn't run yet. Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Dimiduk updated HBASE-7243: Status: Patch Available (was: Open) Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk - first slice (not configurable by cf or dynamically)
[ https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530459#comment-13530459 ] Enis Soztutar commented on HBASE-7055: -- The patch at RB looks good to go. Ted, Stack do you guys want to review? port HBASE-6371 tier-based compaction from 0.89-fb to trunk - first slice (not configurable by cf or dynamically) - Key: HBASE-7055 URL: https://issues.apache.org/jira/browse/HBASE-7055 Project: HBase Issue Type: Task Components: Compaction Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Fix For: 0.96.0 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, HBASE-6371-v3-refactor-only-squashed.patch, HBASE-6371-v4-refactor-only-squashed.patch, HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch There's divergence in the code :( See HBASE-6371 for details. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
[ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530463#comment-13530463 ] Ted Yu commented on HBASE-7342: --- {code} +System.out.println(Original table has: + loadedTableCount + rows); {code} Please use LOG variable for the above. {code} + Thread.currentThread(); {code} Does the above statement have any effect ? {code} +Thread.sleep(1000); {code} Can the sleep duration be shorter ? {code} + } catch (InterruptedException e) { +e.printStackTrace(); {code} Throw InterruptedIOException from the catch block. {code} +return; + {code} nit: remove the empty line. {code} +throw new Exception(Split did not increase the number of regions); {code} nit: use fail(). Split operation without split key incorrectly finds the middle key in off-by-one error -- Key: HBASE-7342 URL: https://issues.apache.org/jira/browse/HBASE-7342 Project: HBase Issue Type: Bug Components: HFile, io Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7342-v1.patch I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting). The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added. Looking into the code here is what goes on: 1. Split takes the largest storefile 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i] 3. Getting the middle root-level index should yield the key in the middle of the storefile 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing. 5. In a result with where there are only 2 blockKeys, this yields the 0th block key. 6. Unfortunately, this is the same block key that 'firstKey' will be. 7. This yields the result in HStore.java:1873 (cannot split because midkey is the same as first or last row) 8. Removing the -1 solves the problem (in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition
[ https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530469#comment-13530469 ] Kyle McGovern commented on HBASE-7335: -- Thanks for the link to the JIRA. It doesn't appear there was any data loss. Failed split can cause a region to get stuck in transition -- Key: HBASE-7335 URL: https://issues.apache.org/jira/browse/HBASE-7335 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.92.1 Reporter: Kyle McGovern Trying to reassign a region after a failed split causes a that region to get stuck in transition. hdfs dfs -R output http://pastebin.com/F4DgTxj1 hbck output http://pastebin.com/BaftESBd error on regionserver http://pastebin.com/Mye60rUA For example, if I remove /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
[ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530472#comment-13530472 ] Ted Yu commented on HBASE-7342: --- There're compilation error: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile (default-testCompile) on project hbase-server: Compilation failure: Compilation failure: [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[40,30] cannot find symbol [ERROR] symbol : class HServerAddress [ERROR] location: package org.apache.hadoop.hbase [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[763,21] cannot find symbol [ERROR] symbol : class HServerAddress [ERROR] location: class org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[763,48] cannot find symbol [ERROR] symbol : method getRegionsInfo() [ERROR] location: class org.apache.hadoop.hbase.client.HTable [ERROR] /Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[772,17] cannot find symbol [ERROR] symbol : method getRegionsInfo() [ERROR] location: class org.apache.hadoop.hbase.client.HTable {code} HServerAddress is replaced by ServerName in trunk. getRegionsInfo() is replaced by getRegionLocations. Split operation without split key incorrectly finds the middle key in off-by-one error -- Key: HBASE-7342 URL: https://issues.apache.org/jira/browse/HBASE-7342 Project: HBase Issue Type: Bug Components: HFile, io Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7342-v1.patch I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting). The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added. Looking into the code here is what goes on: 1. Split takes the largest storefile 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i] 3. Getting the middle root-level index should yield the key in the middle of the storefile 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing. 5. In a result with where there are only 2 blockKeys, this yields the 0th block key. 6. Unfortunately, this is the same block key that 'firstKey' will be. 7. This yields the result in HStore.java:1873 (cannot split because midkey is the same as first or last row) 8. Removing the -1 solves the problem (in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error
[ https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530478#comment-13530478 ] Aleksandr Shulman commented on HBASE-7342: -- Noted...let me take a look. Split operation without split key incorrectly finds the middle key in off-by-one error -- Key: HBASE-7342 URL: https://issues.apache.org/jira/browse/HBASE-7342 Project: HBase Issue Type: Bug Components: HFile, io Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0 Reporter: Aleksandr Shulman Assignee: Aleksandr Shulman Priority: Minor Fix For: 0.96.0, 0.94.4 Attachments: HBASE-7342-v1.patch I took a deeper look into issues I was having using region splitting when specifying a region (but not a key for splitting). The midkey calculation is off by one and when there are 2 rows, will pick the 0th one. This causes the firstkey to be the same as midkey and the split will fail. Removing the -1 causes it work correctly, as per the test I've added. Looking into the code here is what goes on: 1. Split takes the largest storefile 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i resides as blockKeys[i] 3. Getting the middle root-level index should yield the key in the middle of the storefile 4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 0-offset indexing. 5. In a result with where there are only 2 blockKeys, this yields the 0th block key. 6. Unfortunately, this is the same block key that 'firstKey' will be. 7. This yields the result in HStore.java:1873 (cannot split because midkey is the same as first or last row) 8. Removing the -1 solves the problem (in this case). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530483#comment-13530483 ] stack commented on HBASE-7243: -- +1 from me too. Will commit after hadoopqa run... Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement
[ https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530485#comment-13530485 ] Ted Yu commented on HBASE-7340: --- In HMaster.moveRegion(), we already have: {code} this.assignmentManager.balance(rp); if (this.cpHost != null) { this.cpHost.postMove(hri, rp.getSource(), rp.getDestination()); } {code} Meaning, user can register master coprocessor which would receive region movement notification. The assignmentManager.balance(plan) call in HMaster.balance() doesn't send out such notification. I think we can either add notification per region moved, or enhance the following hook (at line 1335) with list of regions moved: {code} this.cpHost.postBalance(); {code} Comments are welcome. Allow user-specified actions following region movement -- Key: HBASE-7340 URL: https://issues.apache.org/jira/browse/HBASE-7340 Project: HBase Issue Type: Bug Reporter: Ted Yu Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information: http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530487#comment-13530487 ] stack commented on HBASE-7243: -- Maybe hadoopqa is messing w/ us again and ain't running. Let me commit this. Usually we name patches w/ a version going forward: i.e. the third version has a v3 or something on it... FYI. Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530490#comment-13530490 ] Nick Dimiduk commented on HBASE-7243: - Rgr. While you have your infra hat on: review board isn't posting notifications back to JIRA. Configuration bug? Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7243) Test for creating a large number of regions
[ https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-7243: - Resolution: Fixed Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) Committed to trunk. Thanks Nick for the patch. Test for creating a large number of regions --- Key: HBASE-7243 URL: https://issues.apache.org/jira/browse/HBASE-7243 Project: HBase Issue Type: Bug Components: Region Assignment, regionserver, test Reporter: Enis Soztutar Assignee: Nick Dimiduk Labels: noob Fix For: 0.96.0 Attachments: 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff After HBASE-7220, I think it will be good to write a unit test/IT to create a large number of regions. We can put a reasonable timeout to the test. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530499#comment-13530499 ] Jean-Daniel Cryans commented on HBASE-7325: --- [~gabriel.reid] alright +1 [~lhofhansl], I'm going to commit this to trunk but I was wondering if you'd want this in 0.94? Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7233) Serializing KeyValues
[ https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530503#comment-13530503 ] Matt Corgan commented on HBASE-7233: Sounds good to me. The IOException on CellInputStream.read() may not be ideal since it will force its way all the way up through the StoreFileScanner, StoreHeap, StoreScanner, RegionHeap, RegionScanner, etc... I haven't thought of a better suggestion though. Can change later if we think of something. Serializing KeyValues - Key: HBASE-7233 URL: https://issues.apache.org/jira/browse/HBASE-7233 Project: HBase Issue Type: Bug Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.96.0 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt Undo KeyValue being a Writable. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.
[ https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-7331: Attachment: HBASE-7331_trunk_02.patch Fixed formatting errors. One of the small tests, TestLruCache fails for me intermittently, I am not sure if there is something wrong in my set up. Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server. -- Key: HBASE-7331 URL: https://issues.apache.org/jira/browse/HBASE-7331 Project: HBase Issue Type: Sub-task Components: regionserver, security Affects Versions: 0.94.3, 0.96.0 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.94.3, 0.96.0 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk_02.patch, HBASE-7331_trunk.patch The following APIs in HRegionServer are either missing hooks to coprocessor or the hooks are not implemented in the AccessController class for security. As a result any unauthorized user can: 1.Open a region 2. Close a region 3. Stop region server 4. Lock a row 5. Unlock a row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads
[ https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530512#comment-13530512 ] Lars Hofhansl commented on HBASE-7336: -- Any objections to committing this (0.94 and 0.96). I'm pretty sure it won't make things worse, and it provably improves some scenarios. HFileBlock.readAtOffset does not work well with multiple threads Key: HBASE-7336 URL: https://issues.apache.org/jira/browse/HBASE-7336 Project: HBase Issue Type: Bug Reporter: Lars Hofhansl Assignee: Lars Hofhansl Priority: Critical Fix For: 0.96.0, 0.94.4 Attachments: 7336-0.94.txt, 7336-0.96.txt HBase grinds to a halt when many threads scan along the same set of blocks and neither read short circuit is nor block caching is enabled for the dfs client ... disabling the block cache makes sense on very large scans. It turns out that synchronizing in istream in HFileBlock.readAtOffset is the culprit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530517#comment-13530517 ] stack commented on HBASE-7268: -- bq. Do we want to consider supplying open timestamp from the master too? Would that close the holes in this mechanism, the ones that we could have if the server times diverge? Building a mechanism based on comparing server times will work most of the time but there'll be folks who will have drifting clocks and then we'll have new interesting issues. How did this happen from your original report above? R moved from C to B correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.
[ https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530522#comment-13530522 ] Andrew Purtell commented on HBASE-7331: --- bq. Fixed formatting errors. Thanks. Looks like there may still be tabs in RegionServerCoprocessorHost but I'll fix that, no worries. bq. TestLruCache fails for me intermittently That would seem unrelated. Running another round of tests with the updated patch to see what's up, if anything. Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server. -- Key: HBASE-7331 URL: https://issues.apache.org/jira/browse/HBASE-7331 Project: HBase Issue Type: Sub-task Components: regionserver, security Affects Versions: 0.94.3, 0.96.0 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.94.3, 0.96.0 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk_02.patch, HBASE-7331_trunk.patch The following APIs in HRegionServer are either missing hooks to coprocessor or the hooks are not implemented in the AccessController class for security. As a result any unauthorized user can: 1.Open a region 2. Close a region 3. Stop region server 4. Lock a row 5. Unlock a row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.
[ https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vandana Ayyalasomayajula updated HBASE-7331: Attachment: HBASE-7331_94_02.patch Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server. -- Key: HBASE-7331 URL: https://issues.apache.org/jira/browse/HBASE-7331 Project: HBase Issue Type: Sub-task Components: regionserver, security Affects Versions: 0.94.3, 0.96.0 Reporter: Vandana Ayyalasomayajula Assignee: Vandana Ayyalasomayajula Fix For: 0.94.3, 0.96.0 Attachments: HBASE-7331_94_02.patch, HBASE-7331_94.patch, HBASE-7331_trunk_02.patch, HBASE-7331_trunk.patch The following APIs in HRegionServer are either missing hooks to coprocessor or the hooks are not implemented in the AccessController class for security. As a result any unauthorized user can: 1.Open a region 2. Close a region 3. Stop region server 4. Lock a row 5. Unlock a row. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7341) Deprecate RowLocks in 0.94
[ https://issues.apache.org/jira/browse/HBASE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-7341: -- Attachment: HBASE-7341.patch Deprecate RowLocks in 0.94 -- Key: HBASE-7341 URL: https://issues.apache.org/jira/browse/HBASE-7341 Project: HBase Issue Type: Task Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.94.4 Attachments: HBASE-7341.patch Since we are removing support in 0.96 (see HBASE-7315), we should deprecate in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7341) Deprecate RowLocks in 0.94
[ https://issues.apache.org/jira/browse/HBASE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gregory Chanan updated HBASE-7341: -- Status: Patch Available (was: Open) Deprecate RowLocks in 0.94 -- Key: HBASE-7341 URL: https://issues.apache.org/jira/browse/HBASE-7341 Project: HBase Issue Type: Task Reporter: Gregory Chanan Assignee: Gregory Chanan Priority: Minor Fix For: 0.94.4 Attachments: HBASE-7341.patch Since we are removing support in 0.96 (see HBASE-7315), we should deprecate in 0.94. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7343) Fix flaky condition for TestDrainingServer
Himanshu Vashishtha created HBASE-7343: -- Summary: Fix flaky condition for TestDrainingServer Key: HBASE-7343 URL: https://issues.apache.org/jira/browse/HBASE-7343 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3 Reporter: Himanshu Vashishtha Priority: Minor The assert statement in setUpBeforeClass() may fail in case the region distribution is not even (a particular rs has 0 regions). This is already fixed in trunk with HBASE-5992, but as that's a bigger change and uses 5877, this jira fixes that issue instead of backporting 5992. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement
[ https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530535#comment-13530535 ] Andrew Purtell commented on HBASE-7340: --- bq. I think we can either add notification per region moved, or enhance the following hook (at line 1335) with list of regions moved I'd +1 a patch which does that. Allow user-specified actions following region movement -- Key: HBASE-7340 URL: https://issues.apache.org/jira/browse/HBASE-7340 Project: HBase Issue Type: Bug Reporter: Ted Yu Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information: http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7343) Fix flaky condition for TestDrainingServer
[ https://issues.apache.org/jira/browse/HBASE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7343: --- Description: The assert statement in setUpBeforeClass() may fail in case the region distribution is not even (a particular rs has 0 regions). {code} junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertFalse(Assert.java:34) at junit.framework.Assert.assertFalse(Assert.java:41) at org.apache.hadoop.hbase.TestDrainingServer.setUpBeforeClass(TestDrainingServer.java:83) {code} This is already fixed in trunk with HBASE-5992, but as that's a bigger change and uses 5877, this jira fixes that issue instead of backporting 5992. was: The assert statement in setUpBeforeClass() may fail in case the region distribution is not even (a particular rs has 0 regions). This is already fixed in trunk with HBASE-5992, but as that's a bigger change and uses 5877, this jira fixes that issue instead of backporting 5992. Fix flaky condition for TestDrainingServer -- Key: HBASE-7343 URL: https://issues.apache.org/jira/browse/HBASE-7343 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3 Reporter: Himanshu Vashishtha Priority: Minor Attachments: HBASE-7343.patch The assert statement in setUpBeforeClass() may fail in case the region distribution is not even (a particular rs has 0 regions). {code} junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertFalse(Assert.java:34) at junit.framework.Assert.assertFalse(Assert.java:41) at org.apache.hadoop.hbase.TestDrainingServer.setUpBeforeClass(TestDrainingServer.java:83) {code} This is already fixed in trunk with HBASE-5992, but as that's a bigger change and uses 5877, this jira fixes that issue instead of backporting 5992. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7343) Fix flaky condition for TestDrainingServer
[ https://issues.apache.org/jira/browse/HBASE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7343: --- Assignee: Himanshu Vashishtha Status: Patch Available (was: Open) Fix flaky condition for TestDrainingServer -- Key: HBASE-7343 URL: https://issues.apache.org/jira/browse/HBASE-7343 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Priority: Minor Attachments: HBASE-7343.patch The assert statement in setUpBeforeClass() may fail in case the region distribution is not even (a particular rs has 0 regions). {code} junit.framework.AssertionFailedError at junit.framework.Assert.fail(Assert.java:48) at junit.framework.Assert.assertTrue(Assert.java:20) at junit.framework.Assert.assertFalse(Assert.java:34) at junit.framework.Assert.assertFalse(Assert.java:41) at org.apache.hadoop.hbase.TestDrainingServer.setUpBeforeClass(TestDrainingServer.java:83) {code} This is already fixed in trunk with HBASE-5992, but as that's a bigger change and uses 5877, this jira fixes that issue instead of backporting 5992. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7343) Fix flaky condition for TestDrainingServer
[ https://issues.apache.org/jira/browse/HBASE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Himanshu Vashishtha updated HBASE-7343: --- Attachment: HBASE-7343.patch Tested in a loop and it passes. Fix flaky condition for TestDrainingServer -- Key: HBASE-7343 URL: https://issues.apache.org/jira/browse/HBASE-7343 Project: HBase Issue Type: Bug Components: test Affects Versions: 0.94.3 Reporter: Himanshu Vashishtha Priority: Minor Attachments: HBASE-7343.patch The assert statement in setUpBeforeClass() may fail in case the region distribution is not even (a particular rs has 0 regions). This is already fixed in trunk with HBASE-5992, but as that's a bigger change and uses 5877, this jira fixes that issue instead of backporting 5992. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530540#comment-13530540 ] Sergey Shelukhin commented on HBASE-7268: - Yes, as long as the clock on the master doesn't act funny. From my experience clocks cannot be trusted... maybe if we had reliable sequence mechanism of some kind. In the original run, it happened due to multiple threads - one thread errors out on A with moved to B, errors out on B, goes to META, and updates cache w/C; meanwhile, some other thread just errorred out on A with moved to B, so he goes and rewrites C with B again. correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server
[ https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530542#comment-13530542 ] Sergey Shelukhin commented on HBASE-7268: - Faulty removal due to errors happens in the same way... I think having sleep time after we get the location is also not good in that sense - we get some server and sleep, then go to that server (on retries), in the time we sleep the region can move ten times correct local region location cache information can be overwritten w/stale information from an old server - Key: HBASE-7268 URL: https://issues.apache.org/jira/browse/HBASE-7268 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Priority: Minor Fix For: 0.96.0 Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch, HBASE-7268-v1.patch, HBASE-7268-v2.patch Discovered via HBASE-7250; related to HBASE-5877. Test is writing from multiple threads. Server A has region R; client knows that. R gets moved from A to server B. B gets killed. R gets moved by master to server C. ~15 seconds later, client tries to write to it (on A?). Multiple client threads report from RegionMoved exception processing logic R moved from C to B, even though such transition never happened (neither in nor before the sequence described below). Not quite sure how the client learned of the transition to C, I assume it's from meta from some other thread... Then, put fails (it may fail due to accumulated errors that are not logged, which I am investigating... but the bogus cache update is there nonwithstanding). I have a patch but not sure if it works, test still fails locally for yet unknown reason. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement
[ https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530554#comment-13530554 ] Ted Yu commented on HBASE-7340: --- @Andy: Can you clarify which of the two choices listed you favor ? If we add notification per region moved, HMaster.balance() may move fewer regions compared to the current code - we don't know the amount of time each notification may take. Allow user-specified actions following region movement -- Key: HBASE-7340 URL: https://issues.apache.org/jira/browse/HBASE-7340 Project: HBase Issue Type: Bug Reporter: Ted Yu Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information: http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement
[ https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530563#comment-13530563 ] Andrew Purtell commented on HBASE-7340: --- bq. Can you clarify which of the two choices listed you favor ? postBalance is just a notification, so passing a list once makes sense. Allow user-specified actions following region movement -- Key: HBASE-7340 URL: https://issues.apache.org/jira/browse/HBASE-7340 Project: HBase Issue Type: Bug Reporter: Ted Yu Sometimes user performs compaction after a region is moved (by balancer). We should provide 'hook' which lets user specify what follow-on actions to take after region movement. See discussion on user mailing list under the thread 'How to know it's time for a major compaction?' for background information: http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira