date:20121212

[
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529730#comment-13529730
]

Lars Hofhansl commented on HBASE-7336:
--

bq. Compactions should go get their own Reader?
That sounds like a save and important improvement.

In other cases it actually seems best to try to get a stream and fall back to
pread if that fails.

Could drive # of reader by he size of the store file, something like a reader
per n GB (n = 1 or 2 maybe). Then we round robin the readers.

Should I commit this for now (assuming it passes HadoopQA and no objections),
and we investigate other options further? Or discuss a bit more to see if we
kind other options?

HFileBlock.readAtOffset does not work well with multiple threads

Key: HBASE-7336
URL: https://issues.apache.org/jira/browse/HBASE-7336
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
Fix For: 0.96.0, 0.94.4

Attachments: 7336-0.94.txt, 7336-0.96.txt

HBase grinds to a halt when many threads scan along the same set of blocks
and neither read short circuit is nor block caching is enabled for the dfs
client ... disabling the block cache makes sense on very large scans.
It turns out that synchronizing in istream in HFileBlock.readAtOffset is the
culprit.

[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2012-12-12 Thread Gabriel Reid (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529742#comment-13529742
 ] 

Gabriel Reid commented on HBASE-7325:
-

I've tested it against the TestReplication unit tests, as well as doing some 
additional testing with HBaseTestingUtility to verify the expected performance 
improvement.

Indeed, I think the once-per-second load being put on the namenode should be a 
non-issue, and worth it for the gain that you get with faster replication on a 
quiet cluster.

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data

2012-12-12 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529746#comment-13529746
 ] 

ramkrishna.s.vasudevan commented on HBASE-7337:
---

Did you check with your values?  Like the inserted values are also String and 
the one that you are querying is also String?
Just to verify...

 SingleColumnValueFilter seems to get unavailble data
 

 Key: HBASE-7337
 URL: https://issues.apache.org/jira/browse/HBASE-7337
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.3, 0.96.0
 Environment: 0.94
Reporter: Zhou wenjian
Assignee: Zhou wenjian
 Fix For: 0.96.0, 0.94.4


 put multi versions of a row.
 r1 cf:q  version:1 value:1
 r1 cf:q  version:2 value:3
 r1 cf:q  version:3 value:2
 the filter in scan is set as below:
 SingleColumnValueFilter valueF = new SingleColumnValueFilter(
 family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes
 .toBytes(2)));
 then i found all of the three versions will be emmitted, then i set 
 latestVersionOnly to false, the result does no change.
   public ReturnCode filterKeyValue(KeyValue keyValue) {
 // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + 
 Bytes.toString(keyValue.getValue()));
 if (this.matchedColumn) {
   // We already found and matched the single column, all keys now pass
   return ReturnCode.INCLUDE;
 } else if (this.latestVersionOnly  this.foundColumn) {
   // We found but did not match the single column, skip to next row
   return ReturnCode.NEXT_ROW;
 }
 if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) {
   return ReturnCode.INCLUDE;
 }
 foundColumn = true;
 if (filterColumnValue(keyValue.getBuffer(),
 keyValue.getValueOffset(), keyValue.getValueLength())) {
   return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
 }
 this.matchedColumn = true;
 return ReturnCode.INCLUDE;
   }
 From the code above, it seeems that version 3 will be first emmited, and set 
 matchedColumn to false, which leads the following version 2 and 1 emmited too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7328) IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove


[ 
https://issues.apache.org/jira/browse/HBASE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529751#comment-13529751
 ] 

Hudson commented on HBASE-7328:
---

Integrated in HBase-0.94 #622 (See 
[https://builds.apache.org/job/HBase-0.94/622/])
HBASE-7328 IntegrationTestRebalanceAndKillServersTargeted supercedes 
IntegrationTestRebalanceAndKillServers, remove (Revision 1420545)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/IntegrationTestRebalanceAndKillServers.java


 IntegrationTestRebalanceAndKillServersTargeted supercedes 
 IntegrationTestRebalanceAndKillServers, remove
 

 Key: HBASE-7328
 URL: https://issues.apache.org/jira/browse/HBASE-7328
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-7328-v0.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data

2012-12-12 Thread Zhou wenjian (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhou wenjian updated HBASE-7337:


Description: 
put multi versions of a row.
r1 cf:q  version:1 value:1
r1 cf:q  version:2 value:3
r1 cf:q  version:3 value:2
the filter in scan is set as below:
SingleColumnValueFilter valueF = new SingleColumnValueFilter(
family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes
.toBytes(2)));

then i found all of the three versions will be emmitted, then i set 
latestVersionOnly to false, the result does no change.


  public ReturnCode filterKeyValue(KeyValue keyValue) {
// System.out.println(REMOVE KEY= + keyValue.toString() + , value= + 
Bytes.toString(keyValue.getValue()));
if (this.matchedColumn) {
  // We already found and matched the single column, all keys now pass
  return ReturnCode.INCLUDE;
} else if (this.latestVersionOnly  this.foundColumn) {
  // We found but did not match the single column, skip to next row
  return ReturnCode.NEXT_ROW;
}
if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) {
  return ReturnCode.INCLUDE;
}
foundColumn = true;
if (filterColumnValue(keyValue.getBuffer(),
keyValue.getValueOffset(), keyValue.getValueLength())) {
  return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
}
this.matchedColumn = true;
return ReturnCode.INCLUDE;
  }

From the code above, it seeems that version 3 will be first emmited, and set 
matchedColumn to true, which leads the following version 2 and 1 emmited too.



  was:
put multi versions of a row.
r1 cf:q  version:1 value:1
r1 cf:q  version:2 value:3
r1 cf:q  version:3 value:2
the filter in scan is set as below:
SingleColumnValueFilter valueF = new SingleColumnValueFilter(
family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes
.toBytes(2)));

then i found all of the three versions will be emmitted, then i set 
latestVersionOnly to false, the result does no change.


  public ReturnCode filterKeyValue(KeyValue keyValue) {
// System.out.println(REMOVE KEY= + keyValue.toString() + , value= + 
Bytes.toString(keyValue.getValue()));
if (this.matchedColumn) {
  // We already found and matched the single column, all keys now pass
  return ReturnCode.INCLUDE;
} else if (this.latestVersionOnly  this.foundColumn) {
  // We found but did not match the single column, skip to next row
  return ReturnCode.NEXT_ROW;
}
if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) {
  return ReturnCode.INCLUDE;
}
foundColumn = true;
if (filterColumnValue(keyValue.getBuffer(),
keyValue.getValueOffset(), keyValue.getValueLength())) {
  return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
}
this.matchedColumn = true;
return ReturnCode.INCLUDE;
  }

From the code above, it seeems that version 3 will be first emmited, and set 
matchedColumn to false, which leads the following version 2 and 1 emmited too.




 SingleColumnValueFilter seems to get unavailble data
 

 Key: HBASE-7337
 URL: https://issues.apache.org/jira/browse/HBASE-7337
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.3, 0.96.0
 Environment: 0.94
Reporter: Zhou wenjian
Assignee: Zhou wenjian
 Fix For: 0.96.0, 0.94.4


 put multi versions of a row.
 r1 cf:q  version:1 value:1
 r1 cf:q  version:2 value:3
 r1 cf:q  version:3 value:2
 the filter in scan is set as below:
 SingleColumnValueFilter valueF = new SingleColumnValueFilter(
 family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes
 .toBytes(2)));
 then i found all of the three versions will be emmitted, then i set 
 latestVersionOnly to false, the result does no change.
   public ReturnCode filterKeyValue(KeyValue keyValue) {
 // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + 
 Bytes.toString(keyValue.getValue()));
 if (this.matchedColumn) {
   // We already found and matched the single column, all keys now pass
   return ReturnCode.INCLUDE;
 } else if (this.latestVersionOnly  this.foundColumn) {
   // We found but did not match the single column, skip to next row
   return ReturnCode.NEXT_ROW;
 }
 if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) {
   return ReturnCode.INCLUDE;
 }
 foundColumn = true;
 if (filterColumnValue(keyValue.getBuffer(),
 keyValue.getValueOffset(), keyValue.getValueLength())) {
   return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
 }
 this.matchedColumn = true;
 return ReturnCode.INCLUDE;
   }
 From the code above, it seeems that version 3 will be first

[jira] [Commented] (HBASE-7337) SingleColumnValueFilter seems to get unavailble data

2012-12-12 Thread Zhou wenjian (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529773#comment-13529773
 ] 

Zhou wenjian commented on HBASE-7337:
-

they are both String



 SingleColumnValueFilter seems to get unavailble data
 

 Key: HBASE-7337
 URL: https://issues.apache.org/jira/browse/HBASE-7337
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.3, 0.96.0
 Environment: 0.94
Reporter: Zhou wenjian
Assignee: Zhou wenjian
 Fix For: 0.96.0, 0.94.4


 put multi versions of a row.
 r1 cf:q  version:1 value:1
 r1 cf:q  version:2 value:3
 r1 cf:q  version:3 value:2
 the filter in scan is set as below:
 SingleColumnValueFilter valueF = new SingleColumnValueFilter(
 family,qualifier,CompareOp.EQUAL,new BinaryComparator(Bytes
 .toBytes(2)));
 then i found all of the three versions will be emmitted, then i set 
 latestVersionOnly to false, the result does no change.
   public ReturnCode filterKeyValue(KeyValue keyValue) {
 // System.out.println(REMOVE KEY= + keyValue.toString() + , value= + 
 Bytes.toString(keyValue.getValue()));
 if (this.matchedColumn) {
   // We already found and matched the single column, all keys now pass
   return ReturnCode.INCLUDE;
 } else if (this.latestVersionOnly  this.foundColumn) {
   // We found but did not match the single column, skip to next row
   return ReturnCode.NEXT_ROW;
 }
 if (!keyValue.matchingColumn(this.columnFamily, this.columnQualifier)) {
   return ReturnCode.INCLUDE;
 }
 foundColumn = true;
 if (filterColumnValue(keyValue.getBuffer(),
 keyValue.getValueOffset(), keyValue.getValueLength())) {
   return this.latestVersionOnly? ReturnCode.NEXT_ROW: ReturnCode.INCLUDE;
 }
 this.matchedColumn = true;
 return ReturnCode.INCLUDE;
   }
 From the code above, it seeems that version 3 will be first emmited, and set 
 matchedColumn to true, which leads the following version 2 and 1 emmited too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.

[
https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529782#comment-13529782
]

Hadoop QA commented on HBASE-7331:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560494/HBASE-7331_94.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 6 new
or modified tests.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3491//console

This message is automatically generated.

Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow
and stop region server.
--

Key: HBASE-7331
URL: https://issues.apache.org/jira/browse/HBASE-7331
Project: HBase
Issue Type: Sub-task
Components: regionserver, security
Affects Versions: 0.94.3, 0.96.0
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
Fix For: 0.94.3, 0.96.0

Attachments: HBASE-7331_94.patch, HBASE-7331_trunk.patch

The following APIs in HRegionServer are either missing hooks to coprocessor
or the hooks are not implemented in the AccessController class for security.
As a result any unauthorized user can:
1.Open a region
2. Close a region
3. Stop region server
4. Lock a row
5. Unlock a row.

[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads

[
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529785#comment-13529785
]

Hadoop QA commented on HBASE-7336:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560514/7336-0.96.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:red}-1 tests included{color}. The patch doesn't appear to include
any new or modified tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
104 warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 23 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestMultiParallel

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3490//console

This message is automatically generated.

HFileBlock.readAtOffset does not work well with multiple threads

Key: HBASE-7336
URL: https://issues.apache.org/jira/browse/HBASE-7336
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
Fix For: 0.96.0, 0.94.4

Attachments: 7336-0.94.txt, 7336-0.96.txt

[jira] [Commented] (HBASE-7334) We should expire the zk session for crashed servers rather than deleting ephemeral znodes

2012-12-12 Thread nkeywal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529844#comment-13529844
 ] 

nkeywal commented on HBASE-7334:


Is there a security impact if we keep the password in a file? We can do some 
dissimulation, but it's will have to be readable, at least by the user account 
used to start/stop hbase.

 We should expire the zk session for crashed servers rather than deleting 
 ephemeral znodes
 -

 Key: HBASE-7334
 URL: https://issues.apache.org/jira/browse/HBASE-7334
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, Zookeeper
Affects Versions: 0.96.0
Reporter: Enis Soztutar

 For faster recovery HBASE-5844 and HBASE-5926 added logic to delete the 
 ephemeral znodes for the master and region server from the hbase-daemon.sh 
 script. However, the master and RSs have other ephemeral nodes that are not 
 cleaned (for example region splitting, table lock)
 Instead of deleting the main znode, we can just invalidate the zookeeper 
 session by doing smt like HBaseTestingUtility.expireSession(). 
 For this we need to keep the zk.getSessionId() and zk.getSessionPasswd() 
 around(write to a local file), keep the file updated for reconnections, and 
 once we know that the zk session is gone in ZNodeClearer, we can just create 
 a new session with the same credentials, and close that one, effectively 
 causing zk to delete all ephemeral nodes for the session. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7205) Coprocessor classloader is replicated for all regions in the HRegionServer


[ 
https://issues.apache.org/jira/browse/HBASE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529882#comment-13529882
 ] 

Hudson commented on HBASE-7205:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #293 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/293/])
HBASE-7205 Coprocessor classloader is replicated for all regions in the 
HRegionServer (Ted Yu and Adrian Muraru) (Revision 1420480)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorClassLoader.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/CoprocessorHost.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestClassLoading.java


 Coprocessor classloader is replicated for all regions in the HRegionServer
 --

 Key: HBASE-7205
 URL: https://issues.apache.org/jira/browse/HBASE-7205
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.92.2, 0.94.2
Reporter: Adrian Muraru
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7205-0.94.txt, 7205-v10.txt, 7205-v1.txt, 7205-v3.txt, 
 7205-v4.txt, 7205-v5.txt, 7205-v6.txt, 7205-v7.txt, 7205-v8.txt, 7205-v9.txt, 
 HBASE-7205_v2.patch


 HBASE-6308 introduced a new custom CoprocessorClassLoader to load the 
 coprocessor classes and a new instance of this CL is created for each single 
 HRegion opened. This leads to OOME-PermGen when the number of regions go 
 above hundres / region server. 
 Having the table coprocessor jailed in a separate classloader is good however 
 we should create only one for all regions of a table in each HRS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7328) IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove


[ 
https://issues.apache.org/jira/browse/HBASE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529881#comment-13529881
 ] 

Hudson commented on HBASE-7328:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #293 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/293/])
HBASE-7328 IntegrationTestRebalanceAndKillServersTargeted supercedes 
IntegrationTestRebalanceAndKillServers, remove (Revision 1420543)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-it/src/test/java/org/apache/hadoop/hbase/IntegrationTestRebalanceAndKillServers.java


 IntegrationTestRebalanceAndKillServersTargeted supercedes 
 IntegrationTestRebalanceAndKillServers, remove
 

 Key: HBASE-7328
 URL: https://issues.apache.org/jira/browse/HBASE-7328
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-7328-v0.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5258) Move coprocessors set out of RegionLoad


[ 
https://issues.apache.org/jira/browse/HBASE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529880#comment-13529880
 ] 

Hudson commented on HBASE-5258:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #293 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/293/])
HBASE-5258 Move coprocessors set out of RegionLoad - Addendum (Sergey) 
(Revision 1420521)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java


 Move coprocessors set out of RegionLoad
 ---

 Key: HBASE-5258
 URL: https://issues.apache.org/jira/browse/HBASE-5258
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Sergey Shelukhin
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-5258-094.patch, HBASE-5258-fix-on-top-of-v1.patch, 
 HBASE-5258-v0.patch, HBASE-5258-v1.patch


 When I worked on HBASE-5256, I revisited the code related to Ser/De of 
 coprocessors set in RegionLoad.
 I think the rationale for embedding coprocessors set is for maximum 
 flexibility where each region can load different coprocessors.
 This flexibility is causing extra cost in the region server to Master 
 communication and increasing the footprint of Master heap.
 Would HServerLoad be a better place for this set ?
 If required, region server should calculate disparity of loaded coprocessors 
 among regions and send report through HServerLoad

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7211) Improve hbase ref guide for the testing part.

2012-12-12 Thread nkeywal (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529986#comment-13529986
]

nkeywal commented on HBASE-7211:

I will commit Jeffrey's patch tomorrow if there is no objection.

Improve hbase ref guide for the testing part.
-

Key: HBASE-7211
URL: https://issues.apache.org/jira/browse/HBASE-7211
Project: HBase
Issue Type: Bug
Components: documentation
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Priority: Minor
Attachments: hbase-7211-partial.patch

Here is some stuff I saw. I will propose a fix in a week or so, please add
the comment or issues you have in mind.
??15.6.1. Apache HBase Modules??
= We should be able to use categories in all modules. The default should be
small; but any test manipulating the time needs to be in a specific jvm
(hence medium), so it's not always related to minicluster.
??15.6.3.6. hbasetests.sh??
= We can remove this chapter, and the script
The script is not totally useless, but I think nobody actually uses it.
= Add a chapter on flakiness.
Some tests are, unfortunately, flaky. While there number decreases, we still
have some. Rules are:
- don't write flaky tests! :-)
- small tests cannot be flaky, as it blocks other test execution. Corollary:
if you have an issue with a small test, it's either your environment either a
severe issue.
- rerun the test a few time to validate, check the ports and file descriptors
used.
??mvn test -P localTests -Dtest=MyTest??
= We could actually activate the localTests profile whenever -Dtest is used.
If we do that, we can remove the reference from localTests in the doc.
??mvn test -P runSmallTests?? ??mvn test -P runMediumTests??
= I'm not sure it's actually used. We could remove them from the pom.xml
(and the doc).
??The HBase build uses a patched version of the maven surefire plugin??
= Hopefully, we will be able to remove this soon :-)
??Integration tests are described TODO: POINTER_TO_INTEGRATION_TEST_SECTION??
= Should be documented

[jira] [Commented] (HBASE-7313) ColumnPaginationFilter should reset count when moving to NEXT_ROW

[
https://issues.apache.org/jira/browse/HBASE-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530005#comment-13530005
]

Hadoop QA commented on HBASE-7313:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560213/7313-trunk.txt
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
104 warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 23 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.filter.TestFilter

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3493//console

This message is automatically generated.

ColumnPaginationFilter should reset count when moving to NEXT_ROW
-

Key: HBASE-7313
URL: https://issues.apache.org/jira/browse/HBASE-7313
Project: HBase
Issue Type: Bug
Components: Filters
Affects Versions: 0.94.3, 0.96.0
Reporter: Varun Sharma
Assignee: Varun Sharma
Fix For: 0.96.0, 0.94.4

Attachments: 7313-0.94.txt, 7313-trunk.txt

ColumnPaginationFilter does not reset count to zero on moving to next row.
Hence, if we have already gotten limit number of columns - the subsequent
rows will always return 0 columns.

[jira] [Commented] (HBASE-7315) Remove support for client-side RowLocks

[
https://issues.apache.org/jira/browse/HBASE-7315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530027#comment-13530027
]

Hadoop QA commented on HBASE-7315:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560447/HBASE-7315-v2.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 18 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
105 warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 21 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3492//console

This message is automatically generated.

Remove support for client-side RowLocks
---

Key: HBASE-7315
URL: https://issues.apache.org/jira/browse/HBASE-7315
Project: HBase
Issue Type: Sub-task
Components: Transactions/MVCC
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Fix For: 0.96.0

Attachments: HBASE-7315.patch, HBASE-7315-v2.patch

See comments in HBASE-7263.

[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition

2012-12-12 Thread ramkrishna.s.vasudevan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530062#comment-13530062
 ] 

ramkrishna.s.vasudevan commented on HBASE-7335:
---

@Kyle
Could you attach the logs during the time of split? Is it possible ?

 Failed split can cause a region to get stuck in transition
 --

 Key: HBASE-7335
 URL: https://issues.apache.org/jira/browse/HBASE-7335
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.1
Reporter: Kyle McGovern

 Trying to reassign a region after a failed split causes a that region to get 
 stuck in transition. 
 hdfs dfs -R output
 http://pastebin.com/F4DgTxj1
 hbck output
 http://pastebin.com/BaftESBd
 error on regionserver
 http://pastebin.com/Mye60rUA
 For example, if I remove
 /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590
 the region will successfully assign and hbck does not show errors for this
 region anymore. The contents of the file appear to just be a split key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads


[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530074#comment-13530074
 ] 

Lars Hofhansl commented on HBASE-7336:
--

TestMultiParallel passed locally.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7205) Coprocessor classloader is replicated for all regions in the HRegionServer

2012-12-12 Thread Adrian Muraru (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530088#comment-13530088
 ] 

Adrian Muraru commented on HBASE-7205:
--

Lars you're right, apparently there is one thread keeping a strong reference to 
our custom classloader. The thing is that this seems to be a junit thread, when 
I'm testing manually with HBase standalone by enabling/disabling a multi-region 
table I can see these instances GC'ed. 
Not 100% sure but I suspect the junit is doing some sort of class loading 
accounting - for reporting purposes or so and keeps these references

 Coprocessor classloader is replicated for all regions in the HRegionServer
 --

 Key: HBASE-7205
 URL: https://issues.apache.org/jira/browse/HBASE-7205
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.92.2, 0.94.2
Reporter: Adrian Muraru
Assignee: Ted Yu
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7205-0.94.txt, 7205-v10.txt, 7205-v1.txt, 7205-v3.txt, 
 7205-v4.txt, 7205-v5.txt, 7205-v6.txt, 7205-v7.txt, 7205-v8.txt, 7205-v9.txt, 
 HBASE-7205_v2.patch


 HBASE-6308 introduced a new custom CoprocessorClassLoader to load the 
 coprocessor classes and a new instance of this CL is created for each single 
 HRegion opened. This leads to OOME-PermGen when the number of regions go 
 above hundres / region server. 
 Having the table coprocessor jailed in a separate classloader is good however 
 we should create only one for all regions of a table in each HRS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7326) SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations

2012-12-12 Thread Gary Helmling (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530091#comment-13530091
 ] 

Gary Helmling commented on HBASE-7326:
--

.bq Could we get rid of SortedCopyOnWriteSet Gary for CSLS?

That's the idea.  Given the lack of locking for CSLS, it may be just as low 
overhead for iteration and would actually be fully thread safe.  In which case, 
let's dump SortedCopyOnWriteSet if it doesn't buy us anything.

 SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations
 -

 Key: HBASE-7326
 URL: https://issues.apache.org/jira/browse/HBASE-7326
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.92.2, 0.94.3, 0.96.0
Reporter: Gary Helmling

 The SortedCopyOnWriteSet implementation uses an internal TreeSet that is 
 copied and replaced on mutation operations.  However, in a few areas, 
 SortedCopyOnWriteSet leaks references to the underlying TreeSet 
 implementations, allowing for unsafe usage:
 * iterator()
 * subSet()
 * headSet()
 * tailSet()
 For Iterator.remove(), we can wrap in an implementation that throws 
 UnsupportedOperationException.  For the sub set methods, we could return new 
 SortedCopyOnWriteSet instances (which would not modify the parent set), or 
 wrap with a new sub set implementation that safely allows modification of the 
 parent set.
 To be clear, the current usage of SortedCopyOnWriteSet does not make use of 
 any of these non-thread-safe methods, but the implementation should be fixed 
 to be completely thread safe and prevent any new issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection

2012-12-12 Thread Hiroshi Ikeda (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530123#comment-13530123
 ] 

Hiroshi Ikeda commented on HBASE-7295:
--

It is meaningless to chage the final instance variable PoolMap to volatile, 
because its effects around ensuring visbility between threads are applied when 
you get/set the reference itself. Also PoolMap is not thread safe indeed, and 
we don't tell what happens from the beginning (HBASE-6651).


 Contention in HBaseClient.getConnection
 ---

 Key: HBASE-7295
 URL: https://issues.apache.org/jira/browse/HBASE-7295
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.3
Reporter: Varun Sharma
Assignee: Varun Sharma
 Fix For: 0.96.0, 0.94.4

 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 
 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 
 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt


 HBaseClient.getConnection() synchronizes on the connections object. We found 
 severe contention on a thrift gateway which was fanning out roughly 3000+ 
 calls per second to hbase region servers. The thrift gateway had 2000+ 
 threads for handling incoming connections. Threads were blocked on the 
 syncrhonized block - we set ipc.pool.size to 200. Since we are using 
 RoundRobin/ThreadLocal pool only - its not necessary to synchronize on 
 connections - it might lead to cases where we might go slightly over the 
 ipc.max.pool.size() but the additional connections would timeout after 
 maxIdleTime - underlying PoolMap connections object is thread safe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection

[
https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530128#comment-13530128
]

Lars Hofhansl commented on HBASE-7295:
--

Indeed. You're right of course.

Contention in HBaseClient.getConnection
---

Key: HBASE-7295
URL: https://issues.apache.org/jira/browse/HBASE-7295
Project: HBase
Issue Type: Improvement
Affects Versions: 0.94.3
Reporter: Varun Sharma
Assignee: Varun Sharma
Fix For: 0.96.0, 0.94.4

Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt,
7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt,
7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt

HBaseClient.getConnection() synchronizes on the connections object. We found
severe contention on a thrift gateway which was fanning out roughly 3000+
calls per second to hbase region servers. The thrift gateway had 2000+
threads for handling incoming connections. Threads were blocked on the
syncrhonized block - we set ipc.pool.size to 200. Since we are using
RoundRobin/ThreadLocal pool only - its not necessary to synchronize on
connections - it might lead to cases where we might go slightly over the
ipc.max.pool.size() but the additional connections would timeout after
maxIdleTime - underlying PoolMap connections object is thread safe.

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530139#comment-13530139
 ] 

Andrew Purtell commented on HBASE-7317:
---

{quote}
trunk pom already specifies htrace:
htrace.version1.49/htrace.version
{quote}

Should we be depending on something that only has a single contributor and 
hasn't seen a commit in over three months?

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7295) Contention in HBaseClient.getConnection


[ 
https://issues.apache.org/jira/browse/HBASE-7295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530138#comment-13530138
 ] 

Lars Hofhansl commented on HBASE-7295:
--

In fact I had misread the whole patch (looked to me like we're checking and 
rechecking connections, but we're checking the connection we're retrieving from 
connections, hence Ted's comment about making that volatile).


 Contention in HBaseClient.getConnection
 ---

 Key: HBASE-7295
 URL: https://issues.apache.org/jira/browse/HBASE-7295
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.3
Reporter: Varun Sharma
Assignee: Varun Sharma
 Fix For: 0.96.0, 0.94.4

 Attachments: 7295-0.94.txt, 7295-0.94-v2.txt, 7295-0.94-v3.txt, 
 7295-0.94-v4.txt, 7295-0.94-v5.txt, 7295-trunk.txt, 7295-trunk.txt, 
 7295-trunk-v2.txt, 7295-trunk-v3.txt, 7295-trunk-v3.txt


 HBaseClient.getConnection() synchronizes on the connections object. We found 
 severe contention on a thrift gateway which was fanning out roughly 3000+ 
 calls per second to hbase region servers. The thrift gateway had 2000+ 
 threads for handling incoming connections. Threads were blocked on the 
 syncrhonized block - we set ipc.pool.size to 200. Since we are using 
 RoundRobin/ThreadLocal pool only - its not necessary to synchronize on 
 connections - it might lead to cases where we might go slightly over the 
 ipc.max.pool.size() but the additional connections would timeout after 
 maxIdleTime - underlying PoolMap connections object is thread safe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530145#comment-13530145
 ] 

stack commented on HBASE-7317:
--

bq. Should we be depending on something that only has a single contributor and 
hasn't seen a commit in over three months?

Fair point.

Hope was that we'd add tracing to hbase w/ this as a start (and that hadoop 
itself would be adding trace I suppose so we could go down into datanodes).  If 
no progress on tracing before, say 0.96, yeah, lets remove it.  But maybe there 
will be progress made in this issue.

Regards a central collector for traces, could try writing an hbase table.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530153#comment-13530153
 ] 

Andrew Purtell commented on HBASE-7317:
---

bq. Fair point

We could also go in the other direction, reach out to Jon for a grant to port 
to Apache as Sergey said, and then carry it forward maintained in tree. It 
would need a sponsor, and work to make it useful along the lines that Todd and 
you suggest. Do we have that is the question.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7328) IntegrationTestRebalanceAndKillServersTargeted supercedes IntegrationTestRebalanceAndKillServers, remove


[ 
https://issues.apache.org/jira/browse/HBASE-7328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530163#comment-13530163
 ] 

Sergey Shelukhin commented on HBASE-7328:
-

Thanks!

 IntegrationTestRebalanceAndKillServersTargeted supercedes 
 IntegrationTestRebalanceAndKillServers, remove
 

 Key: HBASE-7328
 URL: https://issues.apache.org/jira/browse/HBASE-7328
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Trivial
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-7328-v0.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition

2012-12-12 Thread Kyle McGovern (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530173#comment-13530173
 ] 

Kyle McGovern commented on HBASE-7335:
--

[~ram_krish] I'm not sure exactly when the split failed so finding the logs 
might be difficult. Is there any string in particular I might be able to search 
for?

 Failed split can cause a region to get stuck in transition
 --

 Key: HBASE-7335
 URL: https://issues.apache.org/jira/browse/HBASE-7335
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.1
Reporter: Kyle McGovern

 Trying to reassign a region after a failed split causes a that region to get 
 stuck in transition. 
 hdfs dfs -R output
 http://pastebin.com/F4DgTxj1
 hbck output
 http://pastebin.com/BaftESBd
 error on regionserver
 http://pastebin.com/Mye60rUA
 For example, if I remove
 /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590
 the region will successfully assign and hbck does not show errors for this
 region anymore. The contents of the file appear to just be a split key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530177#comment-13530177
 ] 

Todd Lipcon commented on HBASE-7317:


The license is already Apache, so if someone wants to make changes and send a 
pull request, I'm happy to pull them in and publish a new version of htrace. I 
don't think we need substantial changes to htrace itself - more work is 
remaining in the trace collection / viewing area.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530180#comment-13530180
 ] 

Andrew Purtell commented on HBASE-7317:
---

bq. The license is already Apache, so if someone wants to make changes and send 
a pull request, I'm happy to pull them in and publish a new version of htrace.

Any chance of getting spans into HDFS with the current project hosting?

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7334) We should expire the zk session for crashed servers rather than deleting ephemeral znodes


[ 
https://issues.apache.org/jira/browse/HBASE-7334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530193#comment-13530193
 ] 

Enis Soztutar commented on HBASE-7334:
--

bq. Is there a security impact if we keep the password in a file? We can do 
some dissimulation, but it's will have to be readable, at least by the user 
account used to start/stop hbase.
Good question. If we make that file only readable by the hbase user it should 
be fine I think, since he has access to the credentials anyway. 

 We should expire the zk session for crashed servers rather than deleting 
 ephemeral znodes
 -

 Key: HBASE-7334
 URL: https://issues.apache.org/jira/browse/HBASE-7334
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver, Zookeeper
Affects Versions: 0.96.0
Reporter: Enis Soztutar

 For faster recovery HBASE-5844 and HBASE-5926 added logic to delete the 
 ephemeral znodes for the master and region server from the hbase-daemon.sh 
 script. However, the master and RSs have other ephemeral nodes that are not 
 cleaned (for example region splitting, table lock)
 Instead of deleting the main znode, we can just invalidate the zookeeper 
 session by doing smt like HBaseTestingUtility.expireSession(). 
 For this we need to keep the zk.getSessionId() and zk.getSessionPasswd() 
 around(write to a local file), keep the file updated for reconnections, and 
 once we know that the zk session is gone in ZNodeClearer, we can just create 
 a new session with the same credentials, and close that one, effectively 
 causing zk to delete all ephemeral nodes for the session. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7305) ZK based Read/Write locks for table operations

[
https://issues.apache.org/jira/browse/HBASE-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530197#comment-13530197
]

Sergey Shelukhin commented on HBASE-7305:
-

After cursory look at the patch, I have two questions...
1) I think I saw an article about standard zk primitives library, and iirc even
discussed it with Enis. Is it the curator library meant above? If so we should
probably switch to it. Especially if it's easy to modify :)
2) More broadly, I wonder about the scalability impact of this. At the minimum,
locks need to be write-preference to prevent region servers on large clusters
from starving the clients and master forever (separate problem is what to do
with stray region servers stuck with lock (ZK will take care of that?), but
many servers can starve master/clients by sheer force of numbers).

ZK based Read/Write locks for table operations
--

Key: HBASE-7305
URL: https://issues.apache.org/jira/browse/HBASE-7305
Project: HBase
Issue Type: Bug
Components: Client, master, Zookeeper
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Fix For: 0.96.0

Attachments: hbase-7305_v0.patch

This has started as forward porting of HBASE-5494 and HBASE-5991 from the
89-fb branch to trunk, but diverged enough to have it's own issue.
The idea is to implement a zk based read/write lock per table. Master
initiated operations should get the write lock, and region operations (region
split, moving, balance?, etc) acquire a shared read lock.

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530203#comment-13530203
 ] 

Sergey Shelukhin commented on HBASE-7317:
-

Hmm, somehow I missed that in the book. That looks very useful :)
I have looked at the source a bit; is there any good way to add debug 
information to Span-s, e.g. exceptions/etc.?
As far as I understand it currently would trace operation starts/ends, right?
From the patch in the JIRA that adds the hooks, it looks like more hooks 
should be added.

Wrt placement, is there a reason to not put it into org.apache.common..., with 
only HDFS/HBase/etc. specific receivers living in their corresponding projects?
I can do it when I have bandwidth if there are no legal/procedural objections 
or objections from the author.


 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530204#comment-13530204
 ] 

Andrew Purtell commented on HBASE-7317:
---

bq. Wrt placement, is there a reason to not put it into org.apache.common..., 
with only HDFS/HBase/etc. specific receivers living in their corresponding 
projects? I can do it when I have bandwidth if there are no legal/procedural 
objections or objections from the author.

+1 to this

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7243) Test for creating a large number of regions


[ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530206#comment-13530206
 ] 

Sergey Shelukhin commented on HBASE-7243:
-

+1

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530209#comment-13530209
]

Sergey Shelukhin commented on HBASE-7268:
-

ping? Thanks.
Do we want to consider supplying open timestamp from the master too?

correct local region location cache information can be overwritten w/stale
information from an old server
-

Key: HBASE-7268
URL: https://issues.apache.org/jira/browse/HBASE-7268
Project: HBase
Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
Fix For: 0.96.0

Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch,
HBASE-7268-v1.patch, HBASE-7268-v2.patch

Discovered via HBASE-7250; related to HBASE-5877.
Test is writing from multiple threads.
Server A has region R; client knows that.
R gets moved from A to server B.
B gets killed.
R gets moved by master to server C.
~15 seconds later, client tries to write to it (on A?).
Multiple client threads report from RegionMoved exception processing logic R
moved from C to B, even though such transition never happened (neither in
nor before the sequence described below). Not quite sure how the client
learned of the transition to C, I assume it's from meta from some other
thread...
Then, put fails (it may fail due to accumulated errors that are not logged,
which I am investigating... but the bogus cache update is there
nonwithstanding).
I have a patch but not sure if it works, test still fails locally for yet
unknown reason.

[jira] [Commented] (HBASE-7326) SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations


[ 
https://issues.apache.org/jira/browse/HBASE-7326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530230#comment-13530230
 ] 

Ted Yu commented on HBASE-7326:
---

+1 on dropping SortedCopyOnWriteSet

 SortedCopyOnWriteSet is not thread safe due to leaked TreeSet implementations
 -

 Key: HBASE-7326
 URL: https://issues.apache.org/jira/browse/HBASE-7326
 Project: HBase
  Issue Type: Bug
  Components: util
Affects Versions: 0.92.2, 0.94.3, 0.96.0
Reporter: Gary Helmling

 The SortedCopyOnWriteSet implementation uses an internal TreeSet that is 
 copied and replaced on mutation operations.  However, in a few areas, 
 SortedCopyOnWriteSet leaks references to the underlying TreeSet 
 implementations, allowing for unsafe usage:
 * iterator()
 * subSet()
 * headSet()
 * tailSet()
 For Iterator.remove(), we can wrap in an implementation that throws 
 UnsupportedOperationException.  For the sub set methods, we could return new 
 SortedCopyOnWriteSet instances (which would not modify the parent set), or 
 wrap with a new sub set implementation that safely allows modification of the 
 parent set.
 To be clear, the current usage of SortedCopyOnWriteSet does not make use of 
 any of these non-thread-safe methods, but the implementation should be fixed 
 to be completely thread safe and prevent any new issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange

Himanshu Vashishtha created HBASE-7338:
--

 Summary: Fix flaky condition for 
org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
 Key: HBASE-7338
 URL: https://issues.apache.org/jira/browse/HBASE-7338
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3, 0.96.0
Reporter: Himanshu Vashishtha
Priority: Minor


The balancer doesn't run in case a region is in-transition. The check to 
confirm whether there all regions are assigned looks for region count  22, 
where the total regions are 27. This may result in a failure:
{code}
java.lang.AssertionError: After 5 attempts, region assignments were not 
balanced.
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203)
at 
org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123)

.
2012-12-11 13:47:02,231 INFO  [pool-1-thread-1] 
hbase.TestRegionRebalancing(120): Added fourth 
server=p0118.mtv.cloudera.com,44414,1355262422083
2012-12-11 13:47:02,231 INFO  
[RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] 
regionserver.HRegionServer(3769): Registered RegionServer MXBean
2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
running balancer because 1 region(s) in transition: 
{c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d.
 state=OPENING, ts=1355262421037, 
server=p0118.mtv.cloudera.com,54281,1355262419765}
2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load 
Average: 13.0 low border: 9, up border: 16; attempt: 0
2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 
Avg: 13.0 actual: 11
2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 
Avg: 13.0 actual: 15
2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,48044,1355262421787 
Avg: 13.0 actual: 0
2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
hbase.TestRegionRebalancing(179): p0118.mtv.cloudera.com,48044,1355262421787 
Isn't balanced!!! Avg: 13.0 actual: 0 slop: 0.2
2012-12-11 13:47:12,233 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
running balancer because 1 region(s) in transition: 
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange


 [ 
https://issues.apache.org/jira/browse/HBASE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-7338:
---

Attachment: HBASE-7338.patch

Ran the test locally and it passes.

 Fix flaky condition for 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
 -

 Key: HBASE-7338
 URL: https://issues.apache.org/jira/browse/HBASE-7338
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3, 0.96.0
Reporter: Himanshu Vashishtha
Priority: Minor
 Attachments: HBASE-7338.patch


 The balancer doesn't run in case a region is in-transition. The check to 
 confirm whether there all regions are assigned looks for region count  22, 
 where the total regions are 27. This may result in a failure:
 {code}
 java.lang.AssertionError: After 5 attempts, region assignments were not 
 balanced.
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203)
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123)
 .
 2012-12-11 13:47:02,231 INFO  [pool-1-thread-1] 
 hbase.TestRegionRebalancing(120): Added fourth 
 server=p0118.mtv.cloudera.com,44414,1355262422083
 2012-12-11 13:47:02,231 INFO  
 [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] 
 regionserver.HRegionServer(3769): Registered RegionServer MXBean
 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
 running balancer because 1 region(s) in transition: 
 {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d.
  state=OPENING, ts=1355262421037, 
 server=p0118.mtv.cloudera.com,54281,1355262419765}
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load 
 Average: 13.0 low border: 9, up border: 16; attempt: 0
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 
 Avg: 13.0 actual: 11
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 
 Avg: 13.0 actual: 15
 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,48044,1355262421787 
 Avg: 13.0 actual: 0
 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(179): p0118.mtv.cloudera.com,48044,1355262421787 
 Isn't balanced!!! Avg: 13.0 actual: 0 slop: 0.2
 2012-12-11 13:47:12,233 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
 running balancer because 1 region(s) in transition: 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange


 [ 
https://issues.apache.org/jira/browse/HBASE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-7338:
---

Status: Patch Available  (was: Open)

 Fix flaky condition for 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
 -

 Key: HBASE-7338
 URL: https://issues.apache.org/jira/browse/HBASE-7338
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3, 0.96.0
Reporter: Himanshu Vashishtha
Priority: Minor
 Attachments: HBASE-7338.patch


 The balancer doesn't run in case a region is in-transition. The check to 
 confirm whether there all regions are assigned looks for region count  22, 
 where the total regions are 27. This may result in a failure:
 {code}
 java.lang.AssertionError: After 5 attempts, region assignments were not 
 balanced.
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203)
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123)
 .
 2012-12-11 13:47:02,231 INFO  [pool-1-thread-1] 
 hbase.TestRegionRebalancing(120): Added fourth 
 server=p0118.mtv.cloudera.com,44414,1355262422083
 2012-12-11 13:47:02,231 INFO  
 [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] 
 regionserver.HRegionServer(3769): Registered RegionServer MXBean
 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
 running balancer because 1 region(s) in transition: 
 {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d.
  state=OPENING, ts=1355262421037, 
 server=p0118.mtv.cloudera.com,54281,1355262419765}
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load 
 Average: 13.0 low border: 9, up border: 16; attempt: 0
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 
 Avg: 13.0 actual: 11
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 
 Avg: 13.0 actual: 15
 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,48044,1355262421787 
 Avg: 13.0 actual: 0
 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(179): p0118.mtv.cloudera.com,48044,1355262421787 
 Isn't balanced!!! Avg: 13.0 actual: 0 slop: 0.2
 2012-12-11 13:47:12,233 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
 running balancer because 1 region(s) in transition: 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7339) Splitting a hfilelink causes region servers to go down.

Jonathan Hsieh created HBASE-7339:
-

 Summary: Splitting a hfilelink causes region servers to go down.
 Key: HBASE-7339
 URL: https://issues.apache.org/jira/browse/HBASE-7339
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Affects Versions: hbase-6055
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Blocker
 Fix For: hbase-6055



Steps:
- Have a single region table 15 hfiles in it.
- Snapshot it.
- Clone a snapshot 
- region post-open task attempts to compact region.  policy does not compact 
all files. (default seems to be 10)
- after compaction we have hfile links and real hfiles mixed.
- it starts splitting
- creating split references, opening daughers fails 
- hfile links are split, creating hfile link daughter refs.  
hfile-region-table.parentregion
- these split hfile links are interpreted as hfile links with table 
table.parentregion
- Since this is after the splitting PONR, this aborts the server.  It then 
spreads to the next server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7243) Test for creating a large number of regions


[ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530264#comment-13530264
 ] 

Enis Soztutar commented on HBASE-7243:
--

Integration test class name should start with IntegrationTest, can you rename 
it: http://hbase.apache.org/book/hbase.tests.html#integration.tests



 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7339) Splitting a hfilelink causes region servers to go down.

[
https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530267#comment-13530267
]

Jonathan Hsieh commented on HBASE-7339:
---

This was encountered when testing online snapshots, but will affect offline
snapshots as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to
treat as a reference if treating as link fails. Hacky but should work.
2) Change the regex's used to differentiate references and hfilelinks more
strict so that we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust. Currently
'hfile.parentregion', maybe chanage to 'hfile@parentregion'. This would
then allow 'hfile-region-table@parentreigon' to be interpreted
correctly. This is the right way but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or
hfiles/linksfiles if it has killed a few nodes in the cluster.

Splitting a hfilelink causes region servers to go down.
---

Key: HBASE-7339
URL: https://issues.apache.org/jira/browse/HBASE-7339
Project: HBase
Issue Type: Sub-task
Components: snapshots
Affects Versions: hbase-6055
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Blocker
Fix For: hbase-6055

Steps:
- Have a single region table 15 hfiles in it.
- Snapshot it.
- Clone a snapshot
- region post-open task attempts to compact region. policy does not compact
all files. (default seems to be 10)
- after compaction we have hfile links and real hfiles mixed.
- it starts splitting
- creating split references, opening daughers fails
- hfile links are split, creating hfile link daughter refs.
hfile-region-table.parentregion
- these split hfile links are interpreted as hfile links with table
table.parentregion
- Since this is after the splitting PONR, this aborts the server. It then
spreads to the next server.

[jira] [Updated] (HBASE-7339) Splitting a hfilelink causes region servers to go down.

[
https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jonathan Hsieh updated HBASE-7339:
--

Description:
Steps:
- Have a single region table with 15 hfiles in it.
- Snapshot it. (was done using online snapshot from HBASE-7321)
- Clone a snapshot
- region post-open task attempts to compact region. policy does not compact
all files. (default seems to be 10)
- after compaction we have hfile links and real hfiles mixed in the region
- it starts splitting
- creating split references, opening daughers fails
- hfile links are split, creating hfile link daughter refs.
{{hfile\-region\-table.parentregion}}
- these split hfile links are interpreted as hfile links with table
{{table.parentregion}} - {{hfile\-region\-table.parentregion}}
(groupings interpreted incorrectly)
- Since this is after the splitting PONR, this aborts the server. It then
spreads to the next server.

was:

Splitting a hfilelink causes region servers to go down.
---

Steps:
- Have a single region table with 15 hfiles in it.
- Snapshot it. (was done using online snapshot from HBASE-7321)
- Clone a snapshot
- region post-open task attempts to compact region. policy does not compact
all files. (default seems to be 10)
- after compaction we have hfile links and real hfiles mixed in the region
- it starts splitting
- creating split references, opening daughers fails
- hfile links are split, creating hfile link daughter refs.
{{hfile\-region\-table.parentregion}}
- these split hfile links are interpreted as hfile links with table
{{table.parentregion}} -
{{hfile\-region\-table.parentregion}} (groupings interpreted
incorrectly)
- Since this is after the splitting PONR, this aborts the server. It then
spreads to the next server.

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530273#comment-13530273
 ] 

Todd Lipcon commented on HBASE-7317:


We can't put it in org.apache.* unless it's an Apache project. If you want to 
submit it to the incubator as a project I would be interested in joining up, 
but our thinking at the time of development was that it's a small enough piece 
of code that it would be easier to just develop on github until it got traction 
in a bunch of projects.

There's no restriction that Apache projects only depend on other Apache 
projects - eg we depend on Google libraries like protobuf and guava.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7339) Splitting a hfilelink causes region servers to go down.

[
https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530274#comment-13530274
]

Jonathan Hsieh commented on HBASE-7339:
---

I'm going to pursue #1 and then #2 first.

Splitting a hfilelink causes region servers to go down.
---

[jira] [Comment Edited] (HBASE-7339) Splitting a hfilelink causes region servers to go down.

[
https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530267#comment-13530267
]

Jonathan Hsieh edited comment on HBASE-7339 at 12/12/12 8:06 PM:
-

This was encountered when testing online snapshots, but will affect offline
snapshots as well.

Suggested solutions:
1) Make opening the hfile-link daughter reference more robust, by attempting to
treat as a reference if treating as link fails. Hacky but should work.
2) Change the regex's used to differentiate references and hfilelinks more
strict so that we can differentiate. Hacky, not sure if it will work.
3) Change daughter reference link file name to be more robust. Currently
'hfile.parentregion', maybe chanage to 'hfile@parentregion'. This would
then allow 'hfile\-region\-table@parentreigon' to be interpreted
correctly. This is the right way but breaks compatibility

Other follow-ons -- ideally we are more robust by quarantining a bad region or
hfiles/linksfiles if it has killed a few nodes in the cluster.

was (Author: jmhsieh):
This was encountered when testing online snapshots, but will affect offline
snapshots as well.

Other follow-ons -- ideally we are more robust by quarantining a bad region or
hfiles/linksfiles if it has killed a few nodes in the cluster.

Splitting a hfilelink causes region servers to go down.
---

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530275#comment-13530275
 ] 

Andrew Purtell commented on HBASE-7317:
---

I'm pretty sure the thinking is a grant of this code to the Apache Hadoop 
project, not the formation of a full fledged project.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7243) Test for creating a large number of regions


[ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530277#comment-13530277
 ] 

Enis Soztutar commented on HBASE-7243:
--

Also, can you interrupt the Worker thread on timeout? 

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530278#comment-13530278
]

Hadoop QA commented on HBASE-7268:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560337/HBASE-7268-v2.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 9 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
104 warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 22 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:green}+1 core tests{color}. The patch passed unit tests in .

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3495//console

This message is automatically generated.

correct local region location cache information can be overwritten w/stale
information from an old server
-

Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch,
HBASE-7268-v1.patch, HBASE-7268-v2.patch

[jira] [Updated] (HBASE-7339) Splitting a hfilelink causes region servers to go down.


 [ 
https://issues.apache.org/jira/browse/HBASE-7339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-7339:
--

Description: 
Steps:
- Have a single region table t with 15 hfiles in it.
- Snapshot it. (was done using online snapshot from HBASE-7321)
- Clone a snapshot to table t'. 
- t' has its region do a post-open task that attempts to compact region.  
policy does not compact all files. (default seems to be 10)
- after compaction we have hfile links and real hfiles mixed in the region
- t' starts splitting
- creating split references, opening daughers fails 
- hfile links are split, creating hfile link daughter refs.  
{{hfile\-region\-table.parentregion}}
- these split hfile links are interpreted as hfile links with table 
{{table.parentregion}} - {{hfile\-region\-table.parentregion}} 
 (groupings interpreted incorrectly)
- Since this is after the splitting PONR, this aborts the server.  It then 
spreads to the next server.

  was:
Steps:
- Have a single region table with 15 hfiles in it.
- Snapshot it. (was done using online snapshot from HBASE-7321)
- Clone a snapshot 
- region post-open task attempts to compact region.  policy does not compact 
all files. (default seems to be 10)
- after compaction we have hfile links and real hfiles mixed in the region
- it starts splitting
- creating split references, opening daughers fails 
- hfile links are split, creating hfile link daughter refs.  
{{hfile\-region\-table.parentregion}}
- these split hfile links are interpreted as hfile links with table 
{{table.parentregion}} - {{hfile\-region\-table.parentregion}} 
 (groupings interpreted incorrectly)
- Since this is after the splitting PONR, this aborts the server.  It then 
spreads to the next server.


 Splitting a hfilelink causes region servers to go down.
 ---

 Key: HBASE-7339
 URL: https://issues.apache.org/jira/browse/HBASE-7339
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Affects Versions: hbase-6055
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
Priority: Blocker
 Fix For: hbase-6055


 Steps:
 - Have a single region table t with 15 hfiles in it.
 - Snapshot it. (was done using online snapshot from HBASE-7321)
 - Clone a snapshot to table t'. 
 - t' has its region do a post-open task that attempts to compact region.  
 policy does not compact all files. (default seems to be 10)
 - after compaction we have hfile links and real hfiles mixed in the region
 - t' starts splitting
 - creating split references, opening daughers fails 
 - hfile links are split, creating hfile link daughter refs.  
 {{hfile\-region\-table.parentregion}}
 - these split hfile links are interpreted as hfile links with table 
 {{table.parentregion}} - 
 {{hfile\-region\-table.parentregion}}  (groupings interpreted 
 incorrectly)
 - Since this is after the splitting PONR, this aborts the server.  It then 
 spreads to the next server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7340) Allow user-specified actions following region movement

Ted Yu created HBASE-7340:
-

 Summary: Allow user-specified actions following region movement
 Key: HBASE-7340
 URL: https://issues.apache.org/jira/browse/HBASE-7340
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


Sometimes user performs compaction after a region is moved (by balancer). We 
should provide 'hook' which lets user specify what follow-on actions to take 
after region movement.

See discussion on user mailing list under the thread 'How to know it's time for 
a major compaction?' for background information

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7327) Assignment Timeouts: Remove the code from the master

2012-12-12 Thread nkeywal (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530313#comment-13530313
]

nkeywal commented on HBASE-7327:

I've got some doubts on TestMasterFailover.
The way the code is written on a master failover is to look for what is in zk,
and, if the regionserver is down, force a reassign, if not, put it in the RIT
list.

Many tests in TestMasterFailover put a given state in ZK, but keep the
regionserver up. This way, it's actually the timeout that is managing the
region status. It's fast because the timeout is set to a few seconds. But we
should have a test with a real failover, with standard cases, and they should
be fast without setting a timeout to 2 seconds or so.

So:
- this test shows a specific usage of the timeout: being a garbage collector
when we put ourselves in an unexpected situation
- doesn't prove that we're effectively recovering quickly when we have a master
failover, because the very short timeout hides the problem.

As an example, it seems that if the master fails just after creating a offline
znode (before contacting the region server), we need the timeout to recover the
region (i.e. 10 minutes). If confirmed (I will recheck tomorrow), it would be a
bug (not that simple to fix actually), but we don't see it because of this
short timeout.

And so, I'm thinking about:
- refactoring the tests to express the tests that can occurs during a master
failover (including a region server crash, but may be it does exist already)
- keeping the timeout, but as a security only, without doing anything if it's
allocated to a live region server. May be we will need extra cases here, I need
to study the code more.
- May be add extra code if we identify a region opening for too long on a live
server: calling it to check its status, release it or something alike. To be
discussed :-)

Assignment Timeouts: Remove the code from the master

Key: HBASE-7327
URL: https://issues.apache.org/jira/browse/HBASE-7327
Project: HBase
Issue Type: Improvement
Components: master
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
Attachments: 7327.v1.uncomplete.patch

As per HBASE-7247...

[jira] [Commented] (HBASE-7236) add per-table/per-cf configuration via metadata

[
https://issues.apache.org/jira/browse/HBASE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530315#comment-13530315
]

Hadoop QA commented on HBASE-7236:
--

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12560500/HBASE-7236-v1.patch
against trunk revision .

{color:green}+1 @author{color}. The patch does not contain any @author
tags.

{color:green}+1 tests included{color}. The patch appears to include 25 new
or modified tests.

{color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop
2.0 profile.

{color:red}-1 javadoc{color}. The javadoc tool appears to have generated
105 warning messages.

{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.

{color:red}-1 findbugs{color}. The patch appears to introduce 23 new
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.

{color:red}-1 core tests{color}. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestShell
org.apache.hadoop.hbase.TestDrainingServer
org.apache.hadoop.hbase.client.TestMultiParallel

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/3496//console

This message is automatically generated.

add per-table/per-cf configuration via metadata
---

Key: HBASE-7236
URL: https://issues.apache.org/jira/browse/HBASE-7236
Project: HBase
Issue Type: New Feature
Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Attachments: HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE.patch,
HBASE-7236-PROTOTYPE-v1.patch, HBASE-7236-v0.patch, HBASE-7236-v1.patch

Regardless of the compaction policy, it makes sense to have separate
configuration for compactions for different tables and column families, as
their access patterns and workloads can be different. In particular, for
tiered compactions that are being ported from 0.89-fb branch it is necessary
to have, to use it properly.
We might want to add support for compaction configuration via metadata on
table/cf.

[jira] [Updated] (HBASE-7340) Allow user-specified actions following region movement

2012-12-12 Thread Otis Gospodnetic (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated HBASE-7340:


Description: 
Sometimes user performs compaction after a region is moved (by balancer). We 
should provide 'hook' which lets user specify what follow-on actions to take 
after region movement.

See discussion on user mailing list under the thread 'How to know it's time for 
a major compaction?' for background information: 
http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+

  was:
Sometimes user performs compaction after a region is moved (by balancer). We 
should provide 'hook' which lets user specify what follow-on actions to take 
after region movement.

See discussion on user mailing list under the thread 'How to know it's time for 
a major compaction?' for background information


 Allow user-specified actions following region movement
 --

 Key: HBASE-7340
 URL: https://issues.apache.org/jira/browse/HBASE-7340
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 Sometimes user performs compaction after a region is moved (by balancer). We 
 should provide 'hook' which lets user specify what follow-on actions to take 
 after region movement.
 See discussion on user mailing list under the thread 'How to know it's time 
 for a major compaction?' for background information: 
 http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7338) Fix flaky condition for org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange


[ 
https://issues.apache.org/jira/browse/HBASE-7338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530324#comment-13530324
 ] 

Hadoop QA commented on HBASE-7338:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12560621/HBASE-7338.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
104 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 23 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestMasterMetrics

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3497//console

This message is automatically generated.

 Fix flaky condition for 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange
 -

 Key: HBASE-7338
 URL: https://issues.apache.org/jira/browse/HBASE-7338
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3, 0.96.0
Reporter: Himanshu Vashishtha
Priority: Minor
 Attachments: HBASE-7338.patch


 The balancer doesn't run in case a region is in-transition. The check to 
 confirm whether there all regions are assigned looks for region count  22, 
 where the total regions are 27. This may result in a failure:
 {code}
 java.lang.AssertionError: After 5 attempts, region assignments were not 
 balanced.
   at org.junit.Assert.fail(Assert.java:93)
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.assertRegionsAreBalanced(TestRegionRebalancing.java:203)
   at 
 org.apache.hadoop.hbase.TestRegionRebalancing.testRebalanceOnRegionServerNumberChange(TestRegionRebalancing.java:123)
 .
 2012-12-11 13:47:02,231 INFO  [pool-1-thread-1] 
 hbase.TestRegionRebalancing(120): Added fourth 
 server=p0118.mtv.cloudera.com,44414,1355262422083
 2012-12-11 13:47:02,231 INFO  
 [RegionServer:3;p0118.mtv.cloudera.com,44414,1355262422083] 
 regionserver.HRegionServer(3769): Registered RegionServer MXBean
 2012-12-11 13:47:02,231 DEBUG [pool-1-thread-1] master.HMaster(987): Not 
 running balancer because 1 region(s) in transition: 
 {c786446fb2542f190e937057cdc79d9d=test,kkk,1355262401365.c786446fb2542f190e937057cdc79d9d.
  state=OPENING, ts=1355262421037, 
 server=p0118.mtv.cloudera.com,54281,1355262419765}
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(165): There are 4 servers and 26 regions. Load 
 Average: 13.0 low border: 9, up border: 16; attempt: 0
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,51590,1355262395329 
 Avg: 13.0 actual: 11
 2012-12-11 13:47:02,232 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171): p0118.mtv.cloudera.com,52987,1355262407916 
 Avg: 13.0 actual: 15
 2012-12-11 13:47:02,233 DEBUG [pool-1-thread-1] 
 hbase.TestRegionRebalancing(171):

[jira] [Created] (HBASE-7341) Deprecate RowLocks in 0.94

2012-12-12 Thread Gregory Chanan (JIRA)

Gregory Chanan created HBASE-7341:
-

 Summary: Deprecate RowLocks in 0.94
 Key: HBASE-7341
 URL: https://issues.apache.org/jira/browse/HBASE-7341
 Project: HBase
  Issue Type: Task
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.94.4


Since we are removing support in 0.96 (see HBASE-7315), we should deprecate in 
0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-7022) Use multi to batch offline regions in zookeeper

2012-12-12 Thread Jimmy Xiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang resolved HBASE-7022.


Resolution: Won't Fix
  Assignee: Jimmy Xiang

Patched ZooKeeper with async multi support.  Tried to use it to batch offline 
regions, but didn't get much performance gain as expected.

 Use multi to batch offline regions in zookeeper
 ---

 Key: HBASE-7022
 URL: https://issues.apache.org/jira/browse/HBASE-7022
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang

 Bulk assigner needs to set regions offline in zookeeper one by one. I was 
 wondering if we can have some performance improvement if we batch these 
 operations using ZooKeeper#multi.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error

2012-12-12 Thread Aleksandr Shulman (JIRA)

Aleksandr Shulman created HBASE-7342:


 Summary: Split operation without split key incorrectly finds the 
middle key in off-by-one error
 Key: HBASE-7342
 URL: https://issues.apache.org/jira/browse/HBASE-7342
 Project: HBase
  Issue Type: Bug
  Components: HFile, io
Affects Versions: 0.94.3, 0.94.2, 0.94.1, 0.96.0, 0.94.4
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
 Fix For: 0.96.0, 0.94.2


I took a deeper look into issues I was having using region splitting when 
specifying a region (but not a key for splitting).

The midkey calculation is off by one and when there are 2 rows, will pick the 
0th one. This causes the firstkey to be the same as midkey and the split will 
fail. Removing the -1 causes it work correctly, as per the test I've added.

Looking into the code here is what goes on:

1. Split takes the largest storefile
2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key i 
resides as blockKeys[i]
3. Getting the middle root-level index should yield the key in the middle of 
the storefile
4. In step 3, we see that there is a possible erroneous (-1) to adjust for the 
0-offset indexing.
5. In a result with where there are only 2 blockKeys, this yields the 0th block 
key. 
6. Unfortunately, this is the same block key that 'firstKey' will be.
7. This yields the result in HStore.java:1873 (cannot split because midkey is 
the same as first or last row)
8. Removing the -1 solves the problem (in this case). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530403#comment-13530403
 ] 

Todd Lipcon commented on HBASE-7317:


I wouldn't want to put it in Hadoop common -- then we'd have to do elaborate 
stubbing in our compat code in order to use it while still supporting older 
versions. It is also useful for non-Hadoop projects (eg something like 
Cassandra)

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.


[ 
https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530404#comment-13530404
 ] 

Andrew Purtell commented on HBASE-7331:
---

All tests pass for me locally except for 
TestHBaseFsck#testRegionShouldNotBeDeployed, which seems an unrelated failure.

 Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow 
 and stop region server. 
 --

 Key: HBASE-7331
 URL: https://issues.apache.org/jira/browse/HBASE-7331
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, security
Affects Versions: 0.94.3, 0.96.0
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk.patch


 The following APIs in HRegionServer are either missing hooks to coprocessor 
 or the hooks are not implemented in the AccessController class for security. 
 As a result any unauthorized user can:
 1.Open a region
 2. Close a region
 3. Stop region server
 4. Lock a row
 5. Unlock a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition

2012-12-12 Thread Jimmy Xiang (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530405#comment-13530405
]

Jimmy Xiang commented on HBASE-7335:

This region should be a daughter region. The region split should be succeeded.
It looks to me the parent region is removed while there are still daughter
regions refer to it. Since the parent region is gone, we got no choice and
have to remove the reference file.

Do you have any data loss?

Failed split can cause a region to get stuck in transition
--

Key: HBASE-7335
URL: https://issues.apache.org/jira/browse/HBASE-7335
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.92.1
Reporter: Kyle McGovern

Trying to reassign a region after a failed split causes a that region to get
stuck in transition.
hdfs dfs -R output
http://pastebin.com/F4DgTxj1
hbck output
http://pastebin.com/BaftESBd
error on regionserver
http://pastebin.com/Mye60rUA
For example, if I remove
/hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590
the region will successfully assign and hbck does not show errors for this
region anymore. The contents of the file appear to just be a split key.

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530410#comment-13530410
 ] 

Andrew Purtell commented on HBASE-7317:
---

{quote}
Todd: I wouldn't want to put it in Hadoop common – then we'd have to do 
elaborate stubbing in our compat code in order to use it while still supporting 
older versions. It is also useful for non-Hadoop projects (eg something like 
Cassandra)
{quote}

That's disappointing. Then my concern about depending on a project in this 
state stands.

{quote}
Stack: Hope was that we'd add tracing to hbase w/ this as a start (and that 
hadoop itself would be adding trace I suppose so we could go down into 
datanodes). If no progress on tracing before, say 0.96, yeah, lets remove it. 
But maybe there will be progress made in this issue.
{quote}

Perhaps, otherwise +1 for removing it for 0.96.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error

[
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ted Yu updated HBASE-7342:
--

Affects Version/s: (was: 0.94.4)
Fix Version/s: (was: 0.94.2)
0.94.4

Split operation without split key incorrectly finds the middle key in
off-by-one error
--

Key: HBASE-7342
URL: https://issues.apache.org/jira/browse/HBASE-7342
Project: HBase
Issue Type: Bug
Components: HFile, io
Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
Fix For: 0.96.0, 0.94.4

I took a deeper look into issues I was having using region splitting when
specifying a region (but not a key for splitting).
The midkey calculation is off by one and when there are 2 rows, will pick the
0th one. This causes the firstkey to be the same as midkey and the split will
fail. Removing the -1 causes it work correctly, as per the test I've added.
Looking into the code here is what goes on:
1. Split takes the largest storefile
2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key
i resides as blockKeys[i]
3. Getting the middle root-level index should yield the key in the middle of
the storefile
4. In step 3, we see that there is a possible erroneous (-1) to adjust for
the 0-offset indexing.
5. In a result with where there are only 2 blockKeys, this yields the 0th
block key.
6. Unfortunately, this is the same block key that 'firstKey' will be.
7. This yields the result in HStore.java:1873 (cannot split because midkey
is the same as first or last row)
8. Removing the -1 solves the problem (in this case).

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug

[
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530424#comment-13530424
]

Todd Lipcon commented on HBASE-7317:

bq. That's disappointing. Then my concern about depending on a project in this
state stands.

What do you mean? If there are bugs in the code, feel free to submit patches,
and I'm happy to integrate them (I have commit access to the repo). If we end
up with several contributors, I don't foresee any issues proposing it for
Apache incubation.

server-side request problems are hard to debug
--

Key: HBASE-7317
URL: https://issues.apache.org/jira/browse/HBASE-7317
Project: HBase
Issue Type: Brainstorming
Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

I've seen cases during integration tests where the write or read request took
an unexpectedly large amount of time (that, after the client went to the
region server that is reported alive and well, which I know from temporary
debug logging :)), and it's impossible to understand what is going on on the
server side, short of catching the moment with jstack.
Some solutions (off by default) could be
- a facility for tests (especially integration tests) that would trace
Server/Master calls into some log or file (won't help with internals but at
least one could see what was actually received);
- logging the progress of requests between components inside master/server
(e.g. request id=N received, request id=N is being processed in MyClass,
N being drawn on client from local sequence - no guarantees of uniqueness are
necessary).

[jira] [Commented] (HBASE-7317) server-side request problems are hard to debug


[ 
https://issues.apache.org/jira/browse/HBASE-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530429#comment-13530429
 ] 

Andrew Purtell commented on HBASE-7317:
---

{quote}
If there are bugs in the code, feel free to submit patches, and I'm happy to 
integrate them (I have commit access to the repo). If we end up with several 
contributors, I don't foresee any issues proposing it for Apache incubation.
{quote}

If there's progress on tracing, and certainly if this happens, then I won't be 
concerned, yes.

 server-side request problems are hard to debug
 --

 Key: HBASE-7317
 URL: https://issues.apache.org/jira/browse/HBASE-7317
 Project: HBase
  Issue Type: Brainstorming
  Components: IPC/RPC, regionserver
Reporter: Sergey Shelukhin
Priority: Minor

 I've seen cases during integration tests where the write or read request took 
 an unexpectedly large amount of time (that, after the client went to the 
 region server that is reported alive and well, which I know from temporary 
 debug logging :)), and it's impossible to understand what is going on on the 
 server side, short of catching the moment with jstack.
 Some solutions (off by default) could be 
 - a facility for tests (especially integration tests) that would trace 
 Server/Master calls into some log or file (won't help with internals but at 
 least one could see what was actually received);
 - logging the progress of requests between components inside master/server 
 (e.g. request id=N received, request id=N is being processed in MyClass, 
 N being drawn on client from local sequence - no guarantees of uniqueness are 
 necessary).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7233) Serializing KeyValues


 [ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7233:
-

Attachment: 7233v5_encoders.txt

Move stuff around per your review [~mcorgan].

I removed Encoder and Decoder.  They add little.  Yeah, it means IOException 
but most of the time thats what we'll be throwing at its base when 
encoding/decoding.

I think we need to rename CellScanner to CellInputStream and change the method 
name from next to read, especially when you look at this patch.  What you think 
Matt?

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error

2012-12-12 Thread Aleksandr Shulman (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aleksandr Shulman updated HBASE-7342:
-

Attachment: HBASE-7342-v1.patch

Split operation without split key incorrectly finds the middle key in
off-by-one error
--

Attachments: HBASE-7342-v1.patch

[jira] [Updated] (HBASE-7236) add per-table/per-cf configuration via metadata


 [ 
https://issues.apache.org/jira/browse/HBASE-7236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7236:


Attachment: HBASE-7236-v2.patch

Fix TestShell. TestMultiParallel and TestDrainingServer are flaky and pass on 
local.

 add per-table/per-cf configuration via metadata
 ---

 Key: HBASE-7236
 URL: https://issues.apache.org/jira/browse/HBASE-7236
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7236-PROTOTYPE.patch, HBASE-7236-PROTOTYPE.patch, 
 HBASE-7236-PROTOTYPE-v1.patch, HBASE-7236-v0.patch, HBASE-7236-v1.patch, 
 HBASE-7236-v2.patch


 Regardless of the compaction policy, it makes sense to have separate 
 configuration for compactions for different tables and column families, as 
 their access patterns and workloads can be different. In particular, for 
 tiered compactions that are being ported from 0.89-fb branch it is necessary 
 to have, to use it properly.
 We might want to add support for compaction configuration via metadata on 
 table/cf.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7243) Test for creating a large number of regions


 [ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7243:


Attachment: 7243-integration-test-many-splits.diff

Done and done.

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7243) Test for creating a large number of regions


 [ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7243:


Status: Open  (was: Patch Available)

Canceling request for first patch, which hasn't run yet.

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7243) Test for creating a large number of regions


 [ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7243:


Status: Patch Available  (was: Open)

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk - first slice (not configurable by cf or dynamically)


[ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530459#comment-13530459
 ] 

Enis Soztutar commented on HBASE-7055:
--

The patch at RB looks good to go. Ted, Stack do you guys want to review? 

 port HBASE-6371 tier-based compaction from 0.89-fb to trunk - first slice 
 (not configurable by cf or dynamically)
 -

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.96.0

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
 HBASE-6371-v3-refactor-only-squashed.patch, 
 HBASE-6371-v4-refactor-only-squashed.patch, 
 HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
 HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch


 There's divergence in the code :(
 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error


[ 
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530463#comment-13530463
 ] 

Ted Yu commented on HBASE-7342:
---

{code}
+System.out.println(Original table has:  + loadedTableCount +  rows);
{code}
Please use LOG variable for the above.
{code}
+  Thread.currentThread();
{code}
Does the above statement have any effect ?
{code}
+Thread.sleep(1000);
{code}
Can the sleep duration be shorter ?
{code}
+  } catch (InterruptedException e) {
+e.printStackTrace();
{code}
Throw InterruptedIOException from the catch block.
{code}
+return;
+
{code}
nit: remove the empty line.
{code}
+throw new Exception(Split did not increase the number of regions);
{code}
nit: use fail().

 Split operation without split key incorrectly finds the middle key in 
 off-by-one error
 --

 Key: HBASE-7342
 URL: https://issues.apache.org/jira/browse/HBASE-7342
 Project: HBase
  Issue Type: Bug
  Components: HFile, io
Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-7342-v1.patch


 I took a deeper look into issues I was having using region splitting when 
 specifying a region (but not a key for splitting).
 The midkey calculation is off by one and when there are 2 rows, will pick the 
 0th one. This causes the firstkey to be the same as midkey and the split will 
 fail. Removing the -1 causes it work correctly, as per the test I've added.
 Looking into the code here is what goes on:
 1. Split takes the largest storefile
 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key 
 i resides as blockKeys[i]
 3. Getting the middle root-level index should yield the key in the middle of 
 the storefile
 4. In step 3, we see that there is a possible erroneous (-1) to adjust for 
 the 0-offset indexing.
 5. In a result with where there are only 2 blockKeys, this yields the 0th 
 block key. 
 6. Unfortunately, this is the same block key that 'firstKey' will be.
 7. This yields the result in HStore.java:1873 (cannot split because midkey 
 is the same as first or last row)
 8. Removing the -1 solves the problem (in this case). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7335) Failed split can cause a region to get stuck in transition

2012-12-12 Thread Kyle McGovern (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530469#comment-13530469
 ] 

Kyle McGovern commented on HBASE-7335:
--

Thanks for the link to the JIRA. It doesn't appear there was any data loss.

 Failed split can cause a region to get stuck in transition
 --

 Key: HBASE-7335
 URL: https://issues.apache.org/jira/browse/HBASE-7335
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.92.1
Reporter: Kyle McGovern

 Trying to reassign a region after a failed split causes a that region to get 
 stuck in transition. 
 hdfs dfs -R output
 http://pastebin.com/F4DgTxj1
 hbck output
 http://pastebin.com/BaftESBd
 error on regionserver
 http://pastebin.com/Mye60rUA
 For example, if I remove
 /hbase/mytable/2918ce63a9e0bf48b4f3227d88a992b2/RAW/990e00f1058442b3a79de8e39176b978.e6413e07faefd5801f25867ecbc97590
 the region will successfully assign and hbck does not show errors for this
 region anymore. The contents of the file appear to just be a split key.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error


[ 
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530472#comment-13530472
 ] 

Ted Yu commented on HBASE-7342:
---

There're compilation error:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:2.5.1:testCompile 
(default-testCompile) on project hbase-server: Compilation failure: Compilation 
failure:
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[40,30]
 cannot find symbol
[ERROR] symbol  : class HServerAddress
[ERROR] location: package org.apache.hadoop.hbase
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[763,21]
 cannot find symbol
[ERROR] symbol  : class HServerAddress
[ERROR] location: class 
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[763,48]
 cannot find symbol
[ERROR] symbol  : method getRegionsInfo()
[ERROR] location: class org.apache.hadoop.hbase.client.HTable
[ERROR] 
/Users/zhihyu/trunk-hbase/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/TestSplitTransactionOnCluster.java:[772,17]
 cannot find symbol
[ERROR] symbol  : method getRegionsInfo()
[ERROR] location: class org.apache.hadoop.hbase.client.HTable
{code}
HServerAddress is replaced by ServerName in trunk.
getRegionsInfo() is replaced by getRegionLocations.

 Split operation without split key incorrectly finds the middle key in 
 off-by-one error
 --

 Key: HBASE-7342
 URL: https://issues.apache.org/jira/browse/HBASE-7342
 Project: HBase
  Issue Type: Bug
  Components: HFile, io
Affects Versions: 0.94.1, 0.94.2, 0.94.3, 0.96.0
Reporter: Aleksandr Shulman
Assignee: Aleksandr Shulman
Priority: Minor
 Fix For: 0.96.0, 0.94.4

 Attachments: HBASE-7342-v1.patch


 I took a deeper look into issues I was having using region splitting when 
 specifying a region (but not a key for splitting).
 The midkey calculation is off by one and when there are 2 rows, will pick the 
 0th one. This causes the firstkey to be the same as midkey and the split will 
 fail. Removing the -1 causes it work correctly, as per the test I've added.
 Looking into the code here is what goes on:
 1. Split takes the largest storefile
 2. It puts all the keys into a 2-dimensional array called blockKeys[][]. Key 
 i resides as blockKeys[i]
 3. Getting the middle root-level index should yield the key in the middle of 
 the storefile
 4. In step 3, we see that there is a possible erroneous (-1) to adjust for 
 the 0-offset indexing.
 5. In a result with where there are only 2 blockKeys, this yields the 0th 
 block key. 
 6. Unfortunately, this is the same block key that 'firstKey' will be.
 7. This yields the result in HStore.java:1873 (cannot split because midkey 
 is the same as first or last row)
 8. Removing the -1 solves the problem (in this case). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7342) Split operation without split key incorrectly finds the middle key in off-by-one error

2012-12-12 Thread Aleksandr Shulman (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-7342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530478#comment-13530478
]

Aleksandr Shulman commented on HBASE-7342:
--

Noted...let me take a look.

Split operation without split key incorrectly finds the middle key in
off-by-one error
--

Attachments: HBASE-7342-v1.patch

[jira] [Commented] (HBASE-7243) Test for creating a large number of regions


[ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530483#comment-13530483
 ] 

stack commented on HBASE-7243:
--

+1 from me too.  Will commit after hadoopqa run...

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement


[ 
https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530485#comment-13530485
 ] 

Ted Yu commented on HBASE-7340:
---

In HMaster.moveRegion(), we already have:
{code}
  this.assignmentManager.balance(rp);
  if (this.cpHost != null) {
this.cpHost.postMove(hri, rp.getSource(), rp.getDestination());
  }
{code}
Meaning, user can register master coprocessor which would receive region 
movement notification.

The assignmentManager.balance(plan) call in HMaster.balance() doesn't send out 
such notification.
I think we can either add notification per region moved, or enhance the 
following hook (at line 1335) with list of regions moved:
{code}
this.cpHost.postBalance();
{code}
Comments are welcome.

 Allow user-specified actions following region movement
 --

 Key: HBASE-7340
 URL: https://issues.apache.org/jira/browse/HBASE-7340
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 Sometimes user performs compaction after a region is moved (by balancer). We 
 should provide 'hook' which lets user specify what follow-on actions to take 
 after region movement.
 See discussion on user mailing list under the thread 'How to know it's time 
 for a major compaction?' for background information: 
 http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7243) Test for creating a large number of regions


[ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530487#comment-13530487
 ] 

stack commented on HBASE-7243:
--

Maybe hadoopqa is messing w/ us again and ain't running.  Let me commit this.  
Usually we name patches w/ a version going forward: i.e. the third version has 
a v3 or something on it... FYI.

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7243) Test for creating a large number of regions


[ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530490#comment-13530490
 ] 

Nick Dimiduk commented on HBASE-7243:
-

Rgr. While you have your infra hat on: review board isn't posting notifications 
back to JIRA. Configuration bug?

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7243) Test for creating a large number of regions


 [ 
https://issues.apache.org/jira/browse/HBASE-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7243:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to trunk.  Thanks Nick for the patch.

 Test for creating a large number of regions
 ---

 Key: HBASE-7243
 URL: https://issues.apache.org/jira/browse/HBASE-7243
 Project: HBase
  Issue Type: Bug
  Components: Region Assignment, regionserver, test
Reporter: Enis Soztutar
Assignee: Nick Dimiduk
  Labels: noob
 Fix For: 0.96.0

 Attachments: 7243-integration-test-many-splits.diff, 
 7243-integration-test-many-splits.diff, 7243-integration-test-many-splits.diff


 After HBASE-7220, I think it will be good to write a unit test/IT to create a 
 large number of regions. We can put a reasonable timeout to the test. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2012-12-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530499#comment-13530499
 ] 

Jean-Daniel Cryans commented on HBASE-7325:
---

[~gabriel.reid] alright +1

[~lhofhansl], I'm going to commit this to trunk but I was wondering if you'd 
want this in 0.94?

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7233) Serializing KeyValues

2012-12-12 Thread Matt Corgan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530503#comment-13530503
 ] 

Matt Corgan commented on HBASE-7233:


Sounds good to me.  The IOException on CellInputStream.read() may not be ideal 
since it will force its way all the way up through the StoreFileScanner, 
StoreHeap, StoreScanner, RegionHeap, RegionScanner, etc...  I haven't thought 
of a better suggestion though.  Can change later if we think of something.

 Serializing KeyValues
 -

 Key: HBASE-7233
 URL: https://issues.apache.org/jira/browse/HBASE-7233
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.96.0

 Attachments: 7233sketch.txt, 7233.txt, 7233-v2.txt, 
 7233v3_encoders.txt, 7233v4_encoders.txt, 7233v5_encoders.txt


 Undo KeyValue being a Writable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.

2012-12-12 Thread Vandana Ayyalasomayajula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Ayyalasomayajula updated HBASE-7331:


Attachment: HBASE-7331_trunk_02.patch

Fixed formatting errors.  One of the small tests, TestLruCache fails for me 
intermittently, I am not sure if there is something wrong in my set up.

 Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow 
 and stop region server. 
 --

 Key: HBASE-7331
 URL: https://issues.apache.org/jira/browse/HBASE-7331
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, security
Affects Versions: 0.94.3, 0.96.0
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk_02.patch, 
 HBASE-7331_trunk.patch


 The following APIs in HRegionServer are either missing hooks to coprocessor 
 or the hooks are not implemented in the AccessController class for security. 
 As a result any unauthorized user can:
 1.Open a region
 2. Close a region
 3. Stop region server
 4. Lock a row
 5. Unlock a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7336) HFileBlock.readAtOffset does not work well with multiple threads


[ 
https://issues.apache.org/jira/browse/HBASE-7336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530512#comment-13530512
 ] 

Lars Hofhansl commented on HBASE-7336:
--

Any objections to committing this (0.94 and 0.96). I'm pretty sure it won't 
make things worse, and it provably improves some scenarios.

 HFileBlock.readAtOffset does not work well with multiple threads
 

 Key: HBASE-7336
 URL: https://issues.apache.org/jira/browse/HBASE-7336
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.96.0, 0.94.4

 Attachments: 7336-0.94.txt, 7336-0.96.txt


 HBase grinds to a halt when many threads scan along the same set of blocks 
 and neither read short circuit is nor block caching is enabled for the dfs 
 client ... disabling the block cache makes sense on very large scans.
 It turns out that synchronizing in istream in HFileBlock.readAtOffset is the 
 culprit.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530517#comment-13530517
]

stack commented on HBASE-7268:
--

bq. Do we want to consider supplying open timestamp from the master too?

Would that close the holes in this mechanism, the ones that we could have if
the server times diverge? Building a mechanism based on comparing server times
will work most of the time but there'll be folks who will have drifting clocks
and then we'll have new interesting issues.

How did this happen from your original report above? R moved from C to B

correct local region location cache information can be overwritten w/stale
information from an old server
-

Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch,
HBASE-7268-v1.patch, HBASE-7268-v2.patch

[jira] [Commented] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.


[ 
https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530522#comment-13530522
 ] 

Andrew Purtell commented on HBASE-7331:
---

bq. Fixed formatting errors.

Thanks. Looks like there may still be tabs in RegionServerCoprocessorHost but 
I'll fix that, no worries.

bq. TestLruCache fails for me intermittently

That would seem unrelated.

Running another round of tests with the updated patch to see what's up, if 
anything. 


 Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow 
 and stop region server. 
 --

 Key: HBASE-7331
 URL: https://issues.apache.org/jira/browse/HBASE-7331
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, security
Affects Versions: 0.94.3, 0.96.0
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-7331_94.patch, HBASE-7331_trunk_02.patch, 
 HBASE-7331_trunk.patch


 The following APIs in HRegionServer are either missing hooks to coprocessor 
 or the hooks are not implemented in the AccessController class for security. 
 As a result any unauthorized user can:
 1.Open a region
 2. Close a region
 3. Stop region server
 4. Lock a row
 5. Unlock a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7331) Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow and stop region server.

2012-12-12 Thread Vandana Ayyalasomayajula (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vandana Ayyalasomayajula updated HBASE-7331:


Attachment: HBASE-7331_94_02.patch

 Fix missing coprocessor hooks for openRegion, closeRegion, lockRow, unlockRow 
 and stop region server. 
 --

 Key: HBASE-7331
 URL: https://issues.apache.org/jira/browse/HBASE-7331
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver, security
Affects Versions: 0.94.3, 0.96.0
Reporter: Vandana Ayyalasomayajula
Assignee: Vandana Ayyalasomayajula
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-7331_94_02.patch, HBASE-7331_94.patch, 
 HBASE-7331_trunk_02.patch, HBASE-7331_trunk.patch


 The following APIs in HRegionServer are either missing hooks to coprocessor 
 or the hooks are not implemented in the AccessController class for security. 
 As a result any unauthorized user can:
 1.Open a region
 2. Close a region
 3. Stop region server
 4. Lock a row
 5. Unlock a row.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7341) Deprecate RowLocks in 0.94

2012-12-12 Thread Gregory Chanan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-7341:
--

Attachment: HBASE-7341.patch

 Deprecate RowLocks in 0.94
 --

 Key: HBASE-7341
 URL: https://issues.apache.org/jira/browse/HBASE-7341
 Project: HBase
  Issue Type: Task
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.94.4

 Attachments: HBASE-7341.patch


 Since we are removing support in 0.96 (see HBASE-7315), we should deprecate 
 in 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7341) Deprecate RowLocks in 0.94

2012-12-12 Thread Gregory Chanan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-7341:
--

Status: Patch Available  (was: Open)

 Deprecate RowLocks in 0.94
 --

 Key: HBASE-7341
 URL: https://issues.apache.org/jira/browse/HBASE-7341
 Project: HBase
  Issue Type: Task
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.94.4

 Attachments: HBASE-7341.patch


 Since we are removing support in 0.96 (see HBASE-7315), we should deprecate 
 in 0.94.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-7343) Fix flaky condition for TestDrainingServer

Himanshu Vashishtha created HBASE-7343:
--

 Summary: Fix flaky condition for TestDrainingServer
 Key: HBASE-7343
 URL: https://issues.apache.org/jira/browse/HBASE-7343
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3
Reporter: Himanshu Vashishtha
Priority: Minor


The assert statement in setUpBeforeClass() may fail in case the region 
distribution is not even (a particular rs has 0 regions).

This is already fixed in trunk with HBASE-5992, but as that's a bigger change 
and uses 5877, this jira fixes that issue instead of backporting 5992.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement


[ 
https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530535#comment-13530535
 ] 

Andrew Purtell commented on HBASE-7340:
---

bq. I think we can either add notification per region moved, or enhance the 
following hook (at line 1335) with list of regions moved

I'd +1 a patch which does that.

 Allow user-specified actions following region movement
 --

 Key: HBASE-7340
 URL: https://issues.apache.org/jira/browse/HBASE-7340
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 Sometimes user performs compaction after a region is moved (by balancer). We 
 should provide 'hook' which lets user specify what follow-on actions to take 
 after region movement.
 See discussion on user mailing list under the thread 'How to know it's time 
 for a major compaction?' for background information: 
 http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7343) Fix flaky condition for TestDrainingServer


 [ 
https://issues.apache.org/jira/browse/HBASE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-7343:
---

Description: 
The assert statement in setUpBeforeClass() may fail in case the region 
distribution is not even (a particular rs has 0 regions).

{code}
junit.framework.AssertionFailedError
at junit.framework.Assert.fail(Assert.java:48)
at junit.framework.Assert.assertTrue(Assert.java:20)
at junit.framework.Assert.assertFalse(Assert.java:34)
at junit.framework.Assert.assertFalse(Assert.java:41)
at 
org.apache.hadoop.hbase.TestDrainingServer.setUpBeforeClass(TestDrainingServer.java:83)

{code}

This is already fixed in trunk with HBASE-5992, but as that's a bigger change 
and uses 5877, this jira fixes that issue instead of backporting 5992.

  was:
The assert statement in setUpBeforeClass() may fail in case the region 
distribution is not even (a particular rs has 0 regions).

This is already fixed in trunk with HBASE-5992, but as that's a bigger change 
and uses 5877, this jira fixes that issue instead of backporting 5992.


 Fix flaky condition for TestDrainingServer
 --

 Key: HBASE-7343
 URL: https://issues.apache.org/jira/browse/HBASE-7343
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3
Reporter: Himanshu Vashishtha
Priority: Minor
 Attachments: HBASE-7343.patch


 The assert statement in setUpBeforeClass() may fail in case the region 
 distribution is not even (a particular rs has 0 regions).
 {code}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertFalse(Assert.java:34)
   at junit.framework.Assert.assertFalse(Assert.java:41)
   at 
 org.apache.hadoop.hbase.TestDrainingServer.setUpBeforeClass(TestDrainingServer.java:83)
 {code}
 This is already fixed in trunk with HBASE-5992, but as that's a bigger change 
 and uses 5877, this jira fixes that issue instead of backporting 5992.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7343) Fix flaky condition for TestDrainingServer


 [ 
https://issues.apache.org/jira/browse/HBASE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-7343:
---

Assignee: Himanshu Vashishtha
  Status: Patch Available  (was: Open)

 Fix flaky condition for TestDrainingServer
 --

 Key: HBASE-7343
 URL: https://issues.apache.org/jira/browse/HBASE-7343
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
Priority: Minor
 Attachments: HBASE-7343.patch


 The assert statement in setUpBeforeClass() may fail in case the region 
 distribution is not even (a particular rs has 0 regions).
 {code}
 junit.framework.AssertionFailedError
   at junit.framework.Assert.fail(Assert.java:48)
   at junit.framework.Assert.assertTrue(Assert.java:20)
   at junit.framework.Assert.assertFalse(Assert.java:34)
   at junit.framework.Assert.assertFalse(Assert.java:41)
   at 
 org.apache.hadoop.hbase.TestDrainingServer.setUpBeforeClass(TestDrainingServer.java:83)
 {code}
 This is already fixed in trunk with HBASE-5992, but as that's a bigger change 
 and uses 5877, this jira fixes that issue instead of backporting 5992.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-7343) Fix flaky condition for TestDrainingServer


 [ 
https://issues.apache.org/jira/browse/HBASE-7343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Himanshu Vashishtha updated HBASE-7343:
---

Attachment: HBASE-7343.patch

Tested in a loop and it passes.

 Fix flaky condition for TestDrainingServer
 --

 Key: HBASE-7343
 URL: https://issues.apache.org/jira/browse/HBASE-7343
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.94.3
Reporter: Himanshu Vashishtha
Priority: Minor
 Attachments: HBASE-7343.patch


 The assert statement in setUpBeforeClass() may fail in case the region 
 distribution is not even (a particular rs has 0 regions).
 This is already fixed in trunk with HBASE-5992, but as that's a bigger change 
 and uses 5877, this jira fixes that issue instead of backporting 5992.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530540#comment-13530540
]

Sergey Shelukhin commented on HBASE-7268:
-

Yes, as long as the clock on the master doesn't act funny. From my experience
clocks cannot be trusted... maybe if we had reliable sequence mechanism of some
kind.

In the original run, it happened due to multiple threads - one thread errors
out on A with moved to B, errors out on B, goes to META, and updates cache
w/C; meanwhile, some other thread just errorred out on A with moved to B, so
he goes and rewrites C with B again.

correct local region location cache information can be overwritten w/stale
information from an old server
-

Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch,
HBASE-7268-v1.patch, HBASE-7268-v2.patch

[jira] [Commented] (HBASE-7268) correct local region location cache information can be overwritten w/stale information from an old server

[
https://issues.apache.org/jira/browse/HBASE-7268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530542#comment-13530542
]

Sergey Shelukhin commented on HBASE-7268:
-

Faulty removal due to errors happens in the same way... I think having sleep
time after we get the location is also not good in that sense - we get some
server and sleep, then go to that server (on retries), in the time we sleep the
region can move ten times

correct local region location cache information can be overwritten w/stale
information from an old server
-

Attachments: HBASE-7268-v0.patch, HBASE-7268-v0.patch,
HBASE-7268-v1.patch, HBASE-7268-v2.patch

[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement


[ 
https://issues.apache.org/jira/browse/HBASE-7340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530554#comment-13530554
 ] 

Ted Yu commented on HBASE-7340:
---

@Andy:
Can you clarify which of the two choices listed you favor ?
If we add notification per region moved, HMaster.balance() may move fewer 
regions compared to the current code - we don't know the amount of time each 
notification may take.

 Allow user-specified actions following region movement
 --

 Key: HBASE-7340
 URL: https://issues.apache.org/jira/browse/HBASE-7340
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 Sometimes user performs compaction after a region is moved (by balancer). We 
 should provide 'hook' which lets user specify what follow-on actions to take 
 after region movement.
 See discussion on user mailing list under the thread 'How to know it's time 
 for a major compaction?' for background information: 
 http://search-hadoop.com/m/BDx4S1jMjF92subj=How+to+know+it+s+time+for+a+major+compaction+

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7340) Allow user-specified actions following region movement