[jira] [Commented] (HBASE-6877) Coprocessor exec result is incorrect when region is in splitting

2013-03-06 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594491#comment-13594491
 ] 

Anoop Sam John commented on HBASE-6877:
---

[~apurtell] I have not hit with this issue. I was checking with HBASE-6870 and 
seen this. More over I am checking the coprocessorExe code as part of test 
failures with Trunk in examples module ( BulkDelete) I will be able to give the 
reasoning there after some more checking.

 Coprocessor exec result is incorrect when region is in splitting 
 -

 Key: HBASE-6877
 URL: https://issues.apache.org/jira/browse/HBASE-6877
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.94.1
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Attachments: HBASE-6877.patch


 When we execute the coprocessor, we will called HTable#getStartKeysInRange 
 first and get the Keys to exec coprocessor,
 if then some regions are split before execCoprocessor RPC, the Keys are 
 something wrong now, and the result we get is not integrated, 
 for example:
 parent region is split into daughter region A and daughter region B,
 we executed coprocessor on the parent region, but the result data is only 
 daughter region A or daughter region B

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7981) TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 0.95 build #11

2013-03-06 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-7981:
--

Status: Patch Available  (was: Open)

 TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 
 0.95 build #11
 

 Key: HBASE-7981
 URL: https://issues.apache.org/jira/browse/HBASE-7981
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.95.0

 Attachments: 7981.extra.logging.txt, 7981.extra.logging.txt, 
 7981v2.extra.logging.txt, HBASE-7981.patch


 https://builds.apache.org/job/hbase-0.95/11/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/
 Hard to tell which region is missing post crash.  Not logged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7981) TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 0.95 build #11

2013-03-06 Thread ramkrishna.s.vasudevan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-7981:
--

Attachment: HBASE-7981.patch

This patch solves the problem for the failure of the test 
testShouldThrowIOExceptionIfStoreFileSizeIsEmptyAndShouldSuccessfullyExecuteRollback

 TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 
 0.95 build #11
 

 Key: HBASE-7981
 URL: https://issues.apache.org/jira/browse/HBASE-7981
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.95.0

 Attachments: 7981.extra.logging.txt, 7981.extra.logging.txt, 
 7981v2.extra.logging.txt, HBASE-7981.patch


 https://builds.apache.org/job/hbase-0.95/11/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/
 Hard to tell which region is missing post crash.  Not logged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7981) TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 0.95 build #11

2013-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594572#comment-13594572
 ] 

Hadoop QA commented on HBASE-7981:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572291/HBASE-7981.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.backup.TestHFileArchiving

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4695//console

This message is automatically generated.

 TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 
 0.95 build #11
 

 Key: HBASE-7981
 URL: https://issues.apache.org/jira/browse/HBASE-7981
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.95.0

 Attachments: 7981.extra.logging.txt, 7981.extra.logging.txt, 
 7981v2.extra.logging.txt, HBASE-7981.patch


 https://builds.apache.org/job/hbase-0.95/11/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/
 Hard to tell which region is missing post crash.  Not logged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7845) optimize hfile index key

2013-03-06 Thread Liang Xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Xie updated HBASE-7845:
-

Attachment: HBASE-7845-v3.txt

I'd done a workable version on our internal 0.94.3 codebase. Attached v3 is a 
patch for trunk, i merged it manually, let's see what QA robot will say:)

 optimize hfile index key
 

 Key: HBASE-7845
 URL: https://issues.apache.org/jira/browse/HBASE-7845
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Affects Versions: 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7845.txt, HBASE-7845-v2.txt, HBASE-7845-v3.txt


 Leveldb uses ByteWiseComparatorImpl::FindShortestSeparator()  
 FindShortSuccessor() to reduce index key size, it would be helpful under 
 special conditions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7928) Scanning .META. with startRow and/or stopRow is not giving proper results

2013-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594589#comment-13594589
 ] 

Hudson commented on HBASE-7928:
---

Integrated in hbase-0.95 #29 (See 
[https://builds.apache.org/job/hbase-0.95/29/])
HBASE-7928 Scanning .META. with startRow and/or stopRow is not giving 
proper results; REVERT.  NEEDS SOME MORE WORK (Revision 1453177)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java


 Scanning .META. with startRow and/or stopRow is not giving proper results
 -

 Key: HBASE-7928
 URL: https://issues.apache.org/jira/browse/HBASE-7928
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.94.5
Reporter: Jean-Marc Spaggiari
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-7928_0.94_1.patch, HBASE-7928_0.94_2.patch, 
 HBASE-7928_0.94_3.patch, HBASE-7928_0.94_3.patch, HBASE-7928_0.94.patch, 
 HBASE-7928_trunk_1.patch, HBASE-7928_trunk_2.patch, HBASE-7928_trunk_2.patch, 
 HBASE-7928_trunk.patch


 {code}
 try {
   HTable metaTable = new HTable(config, Bytes.toBytes(.META.));
   Scan scan = new Scan();
   scan.setStartRow(Bytes.toBytes(e));
   scan.setStopRow(Bytes.toBytes(z));
   ResultScanner scanner = metaTable.getScanner(scan);
   Result[] results = scanner.next(100);
   while (results.length  0) {
 for (Result result : results) {
   System.out.println(Bytes.toString(result.getRow()));
 }
 results = scanner.next(100);
   }
   scanner.close();
   metaTable.close();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 This code will not return any result even if there is 10 tables with names 
 starting with d to w, including one table called entry. If you comment 
 the setStopRow you will get results, but will still get rows starting with 
 d even if setStartRow is set to e.
 Same code using with a user table is working fine.
 Facing the same issue with the shell.
 scan '.META.' , {STARTROW = 'e', LIMIT = 10} is returning rows starting by 
 d.
 scan '.META.' , {STARTROW = 'e', STOPROW = 'v', LIMIT = 10} is not 
 returning anything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7928) Scanning .META. with startRow and/or stopRow is not giving proper results

2013-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594608#comment-13594608
 ] 

Hudson commented on HBASE-7928:
---

Integrated in HBase-TRUNK #3918 (See 
[https://builds.apache.org/job/HBase-TRUNK/3918/])
HBASE-7928 Scanning .META. with startRow and/or stopRow is not giving 
proper results; REVERT.  NEEDS SOME MORE WORK (Revision 1453175)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java


 Scanning .META. with startRow and/or stopRow is not giving proper results
 -

 Key: HBASE-7928
 URL: https://issues.apache.org/jira/browse/HBASE-7928
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.94.5
Reporter: Jean-Marc Spaggiari
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-7928_0.94_1.patch, HBASE-7928_0.94_2.patch, 
 HBASE-7928_0.94_3.patch, HBASE-7928_0.94_3.patch, HBASE-7928_0.94.patch, 
 HBASE-7928_trunk_1.patch, HBASE-7928_trunk_2.patch, HBASE-7928_trunk_2.patch, 
 HBASE-7928_trunk.patch


 {code}
 try {
   HTable metaTable = new HTable(config, Bytes.toBytes(.META.));
   Scan scan = new Scan();
   scan.setStartRow(Bytes.toBytes(e));
   scan.setStopRow(Bytes.toBytes(z));
   ResultScanner scanner = metaTable.getScanner(scan);
   Result[] results = scanner.next(100);
   while (results.length  0) {
 for (Result result : results) {
   System.out.println(Bytes.toString(result.getRow()));
 }
 results = scanner.next(100);
   }
   scanner.close();
   metaTable.close();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 This code will not return any result even if there is 10 tables with names 
 starting with d to w, including one table called entry. If you comment 
 the setStopRow you will get results, but will still get rows starting with 
 d even if setStartRow is set to e.
 Same code using with a user table is working fine.
 Facing the same issue with the shell.
 scan '.META.' , {STARTROW = 'e', LIMIT = 10} is returning rows starting by 
 d.
 scan '.META.' , {STARTROW = 'e', STOPROW = 'v', LIMIT = 10} is not 
 returning anything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4966) Put/Delete values cannot be tested with MRUnit

2013-03-06 Thread Adamos Loizou (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594621#comment-13594621
 ] 

Adamos Loizou commented on HBASE-4966:
--

Hello, I've also experienced this issue with the latest 0.94.5 version. As 
Philip pointed out, MRUnit assumes that equals(), hashCode() are overridden in 
order to validate the actual output against the expected. Perhaps it could be 
considered an MRUnit shortcoming, not allowing to intercept and custom-match 
the actual output. I have also worked around the problem with a test-only 
implementation of Put that enforces the contract and use that implementation 
for tests and the default for production. We do quite a bit of map/reduce with 
HBase on my team so it would be really helpful to get this fixed. I've seen 
this issue with Hadoop MapWritable and the convention there seems to be to try 
and meet the equals() hashCode() contract.

 Put/Delete values cannot be tested with MRUnit
 --

 Key: HBASE-4966
 URL: https://issues.apache.org/jira/browse/HBASE-4966
 Project: HBase
  Issue Type: Bug
  Components: Client, mapreduce
Affects Versions: 0.90.4
Reporter: Nicholas Telford
Assignee: Nicholas Telford
Priority: Minor

 When using the IdentityTableReducer, which expects input values of either a 
 Put or Delete object, testing with MRUnit the Mapper with MRUnit is not 
 possible because neither Put nor Delete implement equals().
 We should implement equals() on both such that equality means:
 * Both objects are of the same class (in this case, Put or Delete)
 * Both objects are for the same key.
 * Both objects contain an equal set of KeyValues (applicable only to Put)
 KeyValue.equals() appears to already be implemented, but only checks for 
 equality of row key, column family and column qualifier - two KeyValues can 
 be considered equal if they contain different values. This won't work for 
 testing.
 Instead, the Put.equals() and Delete.equals() implementations should do a 
 deep equality check on their KeyValues, like this:
 {code:java}
 myKv.equals(theirKv)  Bytes.equals(myKv.getValue(), theirKv.getValue());
 {code}
 NOTE: This would impact any code that relies on the existing identity 
 implementation of Put.equals() and Delete.equals(), therefore cannot be 
 guaranteed to be backwards-compatible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7845) optimize hfile index key

2013-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594629#comment-13594629
 ] 

Hadoop QA commented on HBASE-7845:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572305/HBASE-7845-v3.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 3 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestMasterFailover
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.regionserver.TestSplitLogWorker

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4696//console

This message is automatically generated.

 optimize hfile index key
 

 Key: HBASE-7845
 URL: https://issues.apache.org/jira/browse/HBASE-7845
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Affects Versions: 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Attachments: HBASE-7845.txt, HBASE-7845-v2.txt, HBASE-7845-v3.txt


 Leveldb uses ByteWiseComparatorImpl::FindShortestSeparator()  
 FindShortSuccessor() to reduce index key size, it would be helpful under 
 special conditions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7415) [snapshots] Add task information to snapshot operation

2013-03-06 Thread Matteo Bertozzi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matteo Bertozzi updated HBASE-7415:
---

Attachment: HBASE-7415-v2.patch

 [snapshots] Add task information to snapshot operation
 --

 Key: HBASE-7415
 URL: https://issues.apache.org/jira/browse/HBASE-7415
 Project: HBase
  Issue Type: New Feature
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.95.0

 Attachments: hbase-7415-v0.patch, hbase-7415-v1.patch, 
 HBASE-7415-v1-rebase.patch, HBASE-7415-v2.patch, HBase_Snapshot_Task_UI.png


 Snapshot operations should have some sort of progresss information available 
 via the WebUI so admins can track progress of operations. This should go a 
 long way to enable 'good' admins to not hose their clusters by running 
 concurrent snapshot operations (e.g. rename while a clone is in progress).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3996) d

2013-03-06 Thread Ron Buckley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Buckley updated HBASE-3996:
---

Summary: d  (was: Support multiple tables and scanners as input to the 
mapper in map/reduce jobs)

 d
 -

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Bryan Baugher
Priority: Critical
 Fix For: 0.95.0, 0.94.5

 Attachments: 3996-0.94.txt, 3996-v10.txt, 3996-v11.txt, 3996-v12.txt, 
 3996-v13.txt, 3996-v14.txt, 3996-v15.txt, 3996-v2.txt, 3996-v3.txt, 
 3996-v4.txt, 3996-v5.txt, 3996-v6.txt, 3996-v7.txt, 3996-v8.txt, 3996-v9.txt, 
 HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs

2013-03-06 Thread Ron Buckley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ron Buckley updated HBASE-3996:
---

Summary:  Support multiple tables and scanners as input to the mapper in 
map/reduce jobs  (was: d)

  Support multiple tables and scanners as input to the mapper in map/reduce 
 jobs
 ---

 Key: HBASE-3996
 URL: https://issues.apache.org/jira/browse/HBASE-3996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Eran Kutner
Assignee: Bryan Baugher
Priority: Critical
 Fix For: 0.95.0, 0.94.5

 Attachments: 3996-0.94.txt, 3996-v10.txt, 3996-v11.txt, 3996-v12.txt, 
 3996-v13.txt, 3996-v14.txt, 3996-v15.txt, 3996-v2.txt, 3996-v3.txt, 
 3996-v4.txt, 3996-v5.txt, 3996-v6.txt, 3996-v7.txt, 3996-v8.txt, 3996-v9.txt, 
 HBase-3996.patch


 It seems that in many cases feeding data from multiple tables or multiple 
 scanners on a single table can save a lot of time when running map/reduce 
 jobs.
 I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7415) [snapshots] Add task information to snapshot operation

2013-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594693#comment-13594693
 ] 

Hadoop QA commented on HBASE-7415:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572312/HBASE-7415-v2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailover
  
org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4697//console

This message is automatically generated.

 [snapshots] Add task information to snapshot operation
 --

 Key: HBASE-7415
 URL: https://issues.apache.org/jira/browse/HBASE-7415
 Project: HBase
  Issue Type: New Feature
  Components: Client, master, regionserver, snapshots, Zookeeper
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.95.0

 Attachments: hbase-7415-v0.patch, hbase-7415-v1.patch, 
 HBASE-7415-v1-rebase.patch, HBASE-7415-v2.patch, HBase_Snapshot_Task_UI.png


 Snapshot operations should have some sort of progresss information available 
 via the WebUI so admins can track progress of operations. This should go a 
 long way to enable 'good' admins to not hose their clusters by running 
 concurrent snapshot operations (e.g. rename while a clone is in progress).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7928) Scanning .META. with startRow and/or stopRow is not giving proper results

2013-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594704#comment-13594704
 ] 

Hudson commented on HBASE-7928:
---

Integrated in hbase-0.95-on-hadoop2 #13 (See 
[https://builds.apache.org/job/hbase-0.95-on-hadoop2/13/])
HBASE-7928 Scanning .META. with startRow and/or stopRow is not giving 
proper results; REVERT.  NEEDS SOME MORE WORK (Revision 1453177)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/branches/0.95/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/branches/0.95/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/branches/0.95/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java


 Scanning .META. with startRow and/or stopRow is not giving proper results
 -

 Key: HBASE-7928
 URL: https://issues.apache.org/jira/browse/HBASE-7928
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.94.5
Reporter: Jean-Marc Spaggiari
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-7928_0.94_1.patch, HBASE-7928_0.94_2.patch, 
 HBASE-7928_0.94_3.patch, HBASE-7928_0.94_3.patch, HBASE-7928_0.94.patch, 
 HBASE-7928_trunk_1.patch, HBASE-7928_trunk_2.patch, HBASE-7928_trunk_2.patch, 
 HBASE-7928_trunk.patch


 {code}
 try {
   HTable metaTable = new HTable(config, Bytes.toBytes(.META.));
   Scan scan = new Scan();
   scan.setStartRow(Bytes.toBytes(e));
   scan.setStopRow(Bytes.toBytes(z));
   ResultScanner scanner = metaTable.getScanner(scan);
   Result[] results = scanner.next(100);
   while (results.length  0) {
 for (Result result : results) {
   System.out.println(Bytes.toString(result.getRow()));
 }
 results = scanner.next(100);
   }
   scanner.close();
   metaTable.close();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 This code will not return any result even if there is 10 tables with names 
 starting with d to w, including one table called entry. If you comment 
 the setStopRow you will get results, but will still get rows starting with 
 d even if setStartRow is set to e.
 Same code using with a user table is working fine.
 Facing the same issue with the shell.
 scan '.META.' , {STARTROW = 'e', LIMIT = 10} is returning rows starting by 
 d.
 scan '.META.' , {STARTROW = 'e', STOPROW = 'v', LIMIT = 10} is not 
 returning anything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7928) Scanning .META. with startRow and/or stopRow is not giving proper results

2013-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594707#comment-13594707
 ] 

Hudson commented on HBASE-7928:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #432 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/432/])
HBASE-7928 Scanning .META. with startRow and/or stopRow is not giving 
proper results; REVERT.  NEEDS SOME MORE WORK (Revision 1453175)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-client/src/main/java/org/apache/hadoop/hbase/client/HTable.java
* 
/hbase/trunk/hbase-common/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java


 Scanning .META. with startRow and/or stopRow is not giving proper results
 -

 Key: HBASE-7928
 URL: https://issues.apache.org/jira/browse/HBASE-7928
 Project: HBase
  Issue Type: Bug
  Components: Usability
Affects Versions: 0.94.5
Reporter: Jean-Marc Spaggiari
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-7928_0.94_1.patch, HBASE-7928_0.94_2.patch, 
 HBASE-7928_0.94_3.patch, HBASE-7928_0.94_3.patch, HBASE-7928_0.94.patch, 
 HBASE-7928_trunk_1.patch, HBASE-7928_trunk_2.patch, HBASE-7928_trunk_2.patch, 
 HBASE-7928_trunk.patch


 {code}
 try {
   HTable metaTable = new HTable(config, Bytes.toBytes(.META.));
   Scan scan = new Scan();
   scan.setStartRow(Bytes.toBytes(e));
   scan.setStopRow(Bytes.toBytes(z));
   ResultScanner scanner = metaTable.getScanner(scan);
   Result[] results = scanner.next(100);
   while (results.length  0) {
 for (Result result : results) {
   System.out.println(Bytes.toString(result.getRow()));
 }
 results = scanner.next(100);
   }
   scanner.close();
   metaTable.close();
 } catch (Exception e) {
   e.printStackTrace();
 }
 {code}
 This code will not return any result even if there is 10 tables with names 
 starting with d to w, including one table called entry. If you comment 
 the setStopRow you will get results, but will still get rows starting with 
 d even if setStartRow is set to e.
 Same code using with a user table is working fine.
 Facing the same issue with the shell.
 scan '.META.' , {STARTROW = 'e', LIMIT = 10} is returning rows starting by 
 d.
 scan '.META.' , {STARTROW = 'e', STOPROW = 'v', LIMIT = 10} is not 
 returning anything.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7904) Upgrade hadoop 2.0 dependency to 2.0.4-alpha

2013-03-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7904:
--

Attachment: hbase-7904-v3.txt

Patch v3 allows mapreduce.TestTableMapReduce#testMultiRegionTable to pass on 
both hadoop 1.0 and 2.0.4-SNAPSHOT (with YARN-429)

With this change similar changes (to YARN-129) in the future don't break HBase 
unit tests.

 Upgrade hadoop 2.0 dependency to 2.0.4-alpha
 

 Key: HBASE-7904
 URL: https://issues.apache.org/jira/browse/HBASE-7904
 Project: HBase
  Issue Type: Task
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.95.0

 Attachments: 7904.txt, 7904-v2.txt, hbase-7904-v3.txt


 2.0.3-alpha has been released.
 We should upgrade the dependency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7845) optimize hfile index key

2013-03-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-7845:
--

Fix Version/s: 0.98.0

 optimize hfile index key
 

 Key: HBASE-7845
 URL: https://issues.apache.org/jira/browse/HBASE-7845
 Project: HBase
  Issue Type: Improvement
  Components: HFile
Affects Versions: 0.96.0
Reporter: Liang Xie
Assignee: Liang Xie
 Fix For: 0.98.0

 Attachments: HBASE-7845.txt, HBASE-7845-v2.txt, HBASE-7845-v3.txt


 Leveldb uses ByteWiseComparatorImpl::FindShortestSeparator()  
 FindShortSuccessor() to reduce index key size, it would be helpful under 
 special conditions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7981) TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 0.95 build #11

2013-03-06 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594873#comment-13594873
 ] 

stack commented on HBASE-7981:
--

[~ram_krish] Tell me more about this patch?  Ups the timeout.  Fixes a failing 
test?  Which build?  Thanks.

 TestSplitTransactionOnCluster.testShutdownFixupWhenDaughterHasSplit failed in 
 0.95 build #11
 

 Key: HBASE-7981
 URL: https://issues.apache.org/jira/browse/HBASE-7981
 Project: HBase
  Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.95.0

 Attachments: 7981.extra.logging.txt, 7981.extra.logging.txt, 
 7981v2.extra.logging.txt, HBASE-7981.patch


 https://builds.apache.org/job/hbase-0.95/11/testReport/junit/org.apache.hadoop.hbase.regionserver/TestSplitTransactionOnCluster/testShutdownFixupWhenDaughterHasSplit/
 Hard to tell which region is missing post crash.  Not logged.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8013) TestZKProcedureControllers fails intermittently in trunk builds

2013-03-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-8013:
-

 Summary: TestZKProcedureControllers fails intermittently in trunk 
builds
 Key: HBASE-8013
 URL: https://issues.apache.org/jira/browse/HBASE-8013
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


See 
https://builds.apache.org/job/HBase-TRUNK/3918/testReport/org.apache.hadoop.hbase.procedure/TestZKProcedureControllers/testSimpleZKCohortMemberController/

This seems to be the reason:
{code}
2013-03-06 10:35:31,088 ERROR [Thread-2-EventThread] 
procedure.ZKProcedureMemberRpcs(218): Illegal argument exception
java.lang.IllegalArgumentException: Data in for starting procuedure 
instanceTest is illegally formatted. Killing the procedure.
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211)
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175)
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56)
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
2013-03-06 10:35:31,090 ERROR [Thread-2-EventThread] 
procedure.ZKProcedureMemberRpcs(281): Failed due to null subprocedure
java.lang.IllegalArgumentException via 
expected:java.lang.IllegalArgumentException: Data in for starting procuedure 
instanceTest is illegally formatted. Killing the procedure.
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:219)
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.waitForNewProcedures(ZKProcedureMemberRpcs.java:175)
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.access$100(ZKProcedureMemberRpcs.java:56)
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs$1.nodeChildrenChanged(ZKProcedureMemberRpcs.java:109)
at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:312)
at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
Caused by: java.lang.IllegalArgumentException: Data in for starting procuedure 
instanceTest is illegally formatted. Killing the procedure.
at 
org.apache.hadoop.hbase.procedure.ZKProcedureMemberRpcs.startNewSubprocedure(ZKProcedureMemberRpcs.java:211)
... 6 more
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8014) Backport HBASE-6915 to 0.94.

2013-03-06 Thread Jean-Marc Spaggiari (JIRA)
Jean-Marc Spaggiari created HBASE-8014:
--

 Summary: Backport HBASE-6915 to 0.94.
 Key: HBASE-8014
 URL: https://issues.apache.org/jira/browse/HBASE-8014
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Critical


JDK 1.7 changed some data size. Goal of this JIRA is to backport HBASE-6915 to 
0.94

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7403) Online Merge

2013-03-06 Thread Daniel Einspanjer (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594883#comment-13594883
 ] 

Daniel Einspanjer commented on HBASE-7403:
--

What is the likely-hood of being able to apply this patch to run against CDH 
0.90.6-cdh3u4 ?

The codebase has changed too much for the old online_merge.rb script to run 
without significant overhauling, and we need to do some region reduction sooner 
than we'd be able to upgrade to CDH4.

I'm just wondering if it is a worthy task to investigate or if it would be a 
deep dark rabbit hole. :) 

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.95.0, 0.94.6
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.0

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, 
 hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, 
 hbase-7403-trunkv13.patch, hbase-7403-trunkv14.patch, 
 hbase-7403-trunkv15.patch, hbase-7403-trunkv16.patch, 
 hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, 
 hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, 
 merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 Usage:
 1.Tool:  
 bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] 
 table-name region-encodedname-1 region-encodedname-2
 2.API: static void MergeManager#createMergeRequest
 We need merge in the following cases:
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support calling merge request through API or shell tool
 About the merge process, please see the attachment and patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8014) Backport HBASE-6915 to 0.94.

2013-03-06 Thread Jean-Marc Spaggiari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Marc Spaggiari updated HBASE-8014:
---

Attachment: HBASE-8014-v0-0.94.patch

 Backport HBASE-6915 to 0.94.
 

 Key: HBASE-8014
 URL: https://issues.apache.org/jira/browse/HBASE-8014
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Critical
 Attachments: HBASE-8014-v0-0.94.patch


 JDK 1.7 changed some data size. Goal of this JIRA is to backport HBASE-6915 
 to 0.94

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)
Francis Liu created HBASE-8015:
--

 Summary: Support for Namespaces
 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: Bug
Reporter: Francis Liu
Assignee: Francis Liu




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594917#comment-13594917
 ] 

Sergey Shelukhin commented on HBASE-8015:
-

+1 on making core (as opposed to cp-s I assume?)
Given the level of autonomy namespaces will provide tenants.  what does this 
mean?
What kind of resource allocation will be possible, except for server groups 
(just examples, to understand it better)?

Then, about storing, I suggest not making it part of table name. It seems 
brittle, and will limit our options if we want to add features lately due to 
backward compat.
Also, how do we handle existing backward compat if someone already has a table 
name foo.bar?
Will root/meta have to be renamed hbase.root/hbase.meta to be in correct 
namespace, and how will it affect current assumptions about sorting if yes?


 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6772) Make the Distributed Split HDFS Location aware

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594938#comment-13594938
 ] 

Jeffrey Zhong commented on HBASE-6772:
--

[~te...@apache.org] Thanks for commenting on this.

The sleepTime is for retrying purpose and only used when 
listChildrenAndWatchForNewChildren hit errors or splitLogZNode doesn't exist. 

In normal case, the following line set a watch on splitLogZNode and returns. 
Zookeeper will notify region servers to grab a task as soon as a new split task 
is saved into ZK.

{code}
childrenPaths = ZKUtil.listChildrenAndWatchForNewChildren(this.watcher,
this.watcher.splitLogZNode);
if (childrenPaths != null) {
  return childrenPaths;
}
{code} 

 Make the Distributed Split HDFS Location aware
 --

 Key: HBASE-6772
 URL: https://issues.apache.org/jira/browse/HBASE-6772
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Jeffrey Zhong

 During a hlog split, each log file (a single hdfs block) is allocated to a 
 different region server. This region server reads the file and creates the 
 recovery edit files.
 The allocation to the region server is random. We could take into account the 
 locations of the log file to split:
 - the reads would be local, hence faster. This allows short circuit as well.
 - less network i/o used during a failure (and this is important)
 - we would be sure to read from a working datanode, hence we're sure we won't 
 have read errors. Read errors slow the split process a lot, as we often enter 
 the timeouted world. 
 We need to limit the calls to the namenode however.
 Typical algo could be:
 - the master gets the locations of the hlog files
 - it writes it into ZK, if possible in one transaction (this way all the 
 tasks are visible alltogether, allowing some arbitrage by the region server).
 - when the regionserver receives the event, it checks for all logs and all 
 locations.
 - if there is a match, it takes it
 - if not it waits something like 0.2s (to give the time to other regionserver 
 to take it if the location matches), and take any remaining task.
 Drawbacks are:
 - a 0.2s delay added if there is no regionserver available on one of the 
 locations. It's likely possible to remove it with some extra synchronization.
 - Small increase in complexity and dependency to HDFS
 Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6772) Make the Distributed Split HDFS Location aware

2013-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594941#comment-13594941
 ] 

Ted Yu commented on HBASE-6772:
---

Do we need to accommodate other types of delay on SplitLogWorker, such as 
network congestion, GC pause, etc ?

 Make the Distributed Split HDFS Location aware
 --

 Key: HBASE-6772
 URL: https://issues.apache.org/jira/browse/HBASE-6772
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Jeffrey Zhong

 During a hlog split, each log file (a single hdfs block) is allocated to a 
 different region server. This region server reads the file and creates the 
 recovery edit files.
 The allocation to the region server is random. We could take into account the 
 locations of the log file to split:
 - the reads would be local, hence faster. This allows short circuit as well.
 - less network i/o used during a failure (and this is important)
 - we would be sure to read from a working datanode, hence we're sure we won't 
 have read errors. Read errors slow the split process a lot, as we often enter 
 the timeouted world. 
 We need to limit the calls to the namenode however.
 Typical algo could be:
 - the master gets the locations of the hlog files
 - it writes it into ZK, if possible in one transaction (this way all the 
 tasks are visible alltogether, allowing some arbitrage by the region server).
 - when the regionserver receives the event, it checks for all logs and all 
 locations.
 - if there is a match, it takes it
 - if not it waits something like 0.2s (to give the time to other regionserver 
 to take it if the location matches), and take any remaining task.
 Drawbacks are:
 - a 0.2s delay added if there is no regionserver available on one of the 
 locations. It's likely possible to remove it with some extra synchronization.
 - Small increase in complexity and dependency to HDFS
 Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594943#comment-13594943
 ] 

Francis Liu commented on HBASE-8015:


{quote}
+1 on making core (as opposed to cp-s I assume?)
{quote}
Yes. Making this core would mean I'd have to break the task into cp and core.  
CP - for region server groups integration and quota control. Core - for basic 
namespace functionality.

{quote}
Given the level of autonomy namespaces will provide tenants.  what does this 
mean?
{quote}
In a security-enabled cluster only system admins can create table, namespaces 
will introduce the notion of namespaces admins which will be granted to 
tenants. Thus enabling them to create tables themselves.

{quota}
Then, about storing, I suggest not making it part of table name. It seems 
brittle, and will limit our options if we want to add features lately due to 
backward compat.
{quota}
Having it part of the table name makes the changes less invasive (changes in 
meta schema, HTable apis, etc). Though I agree it would be nice to make this 

{quota}
Also, how do we handle existing backward compat if someone already has a table 
name foo.bar?
{quota}
Last I checked '.' around allowed as part of the tablename. The cli will bork 
if '.' is used?

{quota}
Will root/meta have to be renamed hbase.root/hbase.meta to be in correct 
namespace, and how will it affect current assumptions about sorting if yes?
{quota}
As part of backward compatibility we can skip renaming root and meta and just 
explicitly support that they are part of the system namespace? These tables are 
already treated differently anyway?



 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594951#comment-13594951
 ] 

Francis Liu commented on HBASE-8015:


reposting coz of typos:

{quote}
+1 on making core (as opposed to cp-s I assume?)
{quote}
Yes. Making this core would mean I'd have to break the task into cp and core. 
CP - for region server groups integration and quota control. Core - for basic 
namespace functionality.

{quote}
Given the level of autonomy namespaces will provide tenants.  what does this 
mean?
{quote}
In a security-enabled cluster only system admins can create table, namespaces 
will introduce the notion of namespaces admins which will be granted to 
tenants. Thus enabling them to create tables themselves.

{quote}
Then, about storing, I suggest not making it part of table name. It seems 
brittle, and will limit our options if we want to add features lately due to 
backward compat.
{quote}

Having it part of the table name makes the changes less invasive (changes in 
meta schema, HTable apis, etc). Though I agree it would be nice to make this

{quote}
Also, how do we handle existing backward compat if someone already has a table 
name foo.bar?
{quote}
Last I checked '.' around allowed as part of the tablename. The cli will bork 
if '.' is used?

{quote}
Will root/meta have to be renamed hbase.root/hbase.meta to be in correct 
namespace, and how will it affect current assumptions about sorting if yes?
{quote}

As part of backward compatibility we can skip renaming root and meta and just 
explicitly support that they are part of the system namespace? These tables are 
already treated differently anyway?


 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7982) TestReplicationQueueFailover* runs for a minute, spews 3/4million lines complaining 'Filesystem closed', has an NPE, and still passes?

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594920#comment-13594920
 ] 

Jeffrey Zhong commented on HBASE-7982:
--

[~saint@gmail.com] Are you ok with the patch? I hope we can get patch in 
soon. 

Thanks,
-Jeffrey

 TestReplicationQueueFailover* runs for a minute, spews 3/4million lines 
 complaining 'Filesystem closed', has an NPE, and still passes?
 --

 Key: HBASE-7982
 URL: https://issues.apache.org/jira/browse/HBASE-7982
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: stack
Assignee: Jeffrey Zhong
Priority: Blocker
 Attachments: hbase-7982-combined_1.patch, hbase-7982-combined.patch, 
 hbase-7982-huge-logging.patch, hbase-7982-NPE_2.patch, hbase-7982-NPE.patch


 I was trying to look at why the odd time Hudson OOMEs trying to make a report 
 on 0.95 build #4 https://builds.apache.org/job/hbase-0.95/4/console:
 {code}
 ERROR: Failed to archive test reports
 hudson.util.IOException2: remote file operation failed: 
 /home/jenkins/jenkins-slave/workspace/hbase-0.95 at 
 hudson.remoting.Channel@151a4e3e:ubuntu3
   at hudson.FilePath.act(FilePath.java:861)
   at hudson.FilePath.act(FilePath.java:838)
   at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:87)
   at 
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:329)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
   at java.nio.charset.Charset.decode(Charset.java:791)
   at hudson.tasks.junit.SuiteResult.init(SuiteResult.java:215)
 ...
 {code}
 We are trying to allocate a big buffer and failing.
 Looking at reports being generated, we have quite a few that are  10MB in 
 size:
 {code}
 durruti:0.95 stack$ find hbase-* -type f -size +1k -exec ls -la {} \;
 -rw-r--r--@ 1 stack  staff  11126492 Feb 27 06:14 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.backup.TestHFileArchiving-output.txt
 -rw-r--r--@ 1 stack  staff  13296009 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
 -rw-r--r--@ 1 stack  staff  10541898 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMultiParallel-output.txt
 -rw-r--r--@ 1 stack  staff  25344601 Feb 27 05:51 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient-output.txt
 -rw-r--r--@ 1 stack  staff  17966969 Feb 27 06:12 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction-output.txt
 -rw-r--r--@ 1 stack  staff  17699068 Feb 27 06:09 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit-output.txt
 -rw-r--r--@ 1 stack  staff  17701832 Feb 27 06:07 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed-output.txt
 -rw-r--r--@ 1 stack  staff  717853709 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailover-output.txt
 -rw-r--r--@ 1 stack  staff  563616793 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
 {code}
 ... with TestReplicationQueueFailover* being order of magnitude bigger than 
 the others.
 Looking in the test I see both spewing between 800 and 900 thousand lines in 
 about a minute.  Here is their fixation:
 {code}
 8908998 2013-02-27 06:17:48,176 ERROR 
 [RegionServer:1;hemera.apache.org,35712,1361945801803.logSyncer] 
 wal.FSHLog$LogSyncer(1012): Error while syncing, requesting close of hlog.
 8908999 java.io.IOException: Filesystem closed
 8909000 ,...at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:319)
 8909001 ,...at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
 8909002 ,...at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3843)
 8909003 ,...at 
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
 8909004 ,...at 
 org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
 8909005 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:248)
 8909006 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1120)
 8909007 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1058)
 8909008 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.sync(FSHLog.java:1228)
 8909009 ,...at 
 

[jira] [Commented] (HBASE-7403) Online Merge

2013-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594944#comment-13594944
 ] 

Ted Yu commented on HBASE-7403:
---

@Daniel:
0.90.x is too different from 0.94, let alone trunk.

0.94.y, a future release, would be more realistic target.

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.95.0, 0.94.6
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.0

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, 
 hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, 
 hbase-7403-trunkv13.patch, hbase-7403-trunkv14.patch, 
 hbase-7403-trunkv15.patch, hbase-7403-trunkv16.patch, 
 hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, 
 hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, 
 merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 Usage:
 1.Tool:  
 bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] 
 table-name region-encodedname-1 region-encodedname-2
 2.API: static void MergeManager#createMergeRequest
 We need merge in the following cases:
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support calling merge request through API or shell tool
 About the merge process, please see the attachment and patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7824) Improve master start up time when there is log splitting work

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594952#comment-13594952
 ] 

Jeffrey Zhong commented on HBASE-7824:
--

[~ram_krish] Are you all right with my explanation in 
https://issues.apache.org/jira/browse/HBASE-7824?focusedCommentId=13592721page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13592721?
 So far the test cases passed with a small modifications in the file 
src/test/java/org/apache/hadoop/hbase/MiniHBaseCluster.java.

Thanks,
-Jeffrey

 Improve master start up time when there is log splitting work
 -

 Key: HBASE-7824
 URL: https://issues.apache.org/jira/browse/HBASE-7824
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.94.7

 Attachments: HBASE-7824_3.patch, hbase-7824.patch, hbase-7824_v2.patch


 When there is log split work going on, master start up waits till all log 
 split work completes even though the log split has nothing to do with meta 
 region servers.
 It's a bad behavior considering a master node can run when log split is 
 happening while its start up is blocking by log split work. 
 Since master is kind of single point of failure, we should start it ASAP.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594946#comment-13594946
 ] 

Enis Soztutar commented on HBASE-8015:
--

+1 on having namespaces in core. Namespace/database's are universally 
understood in terms of the database space. We can keep the grouping of 
regionservers per namespace out of core, and deliver that as a part of the 
other region grouping issue. 

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594949#comment-13594949
 ] 

Ted Yu commented on HBASE-8015:
---

w.r.t. dot in table name:
{code}
  public static final String VALID_USER_TABLE_REGEX = 
(?:[a-zA-Z_0-9][a-zA-Z_0-9.-]*);
{code}
So dot should be allowed.

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594954#comment-13594954
 ] 

Ted Yu commented on HBASE-8015:
---

@Francis:
In the future when you want to correct your previous comment, you can quote the 
previous one and provide correction below the quote.

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594958#comment-13594958
 ] 

Ted Yu commented on HBASE-8015:
---

HBASE-7999 provides System table support.

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594961#comment-13594961
 ] 

Francis Liu commented on HBASE-8015:


{quote}
w.r.t. dot in table name:

  public static final String VALID_USER_TABLE_REGEX = 
(?:[a-zA-Z_0-9][a-zA-Z_0-9.-]*);

So dot should be allowed
{quote}
Thanks for the clarification. It seems like '.' and '-' isn't allowed only if 
it's the first character. For backward compatibility why don't we create 
namespaces for those tables that are named that way? 


 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6772) Make the Distributed Split HDFS Location aware

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594962#comment-13594962
 ] 

Jeffrey Zhong commented on HBASE-6772:
--

Agree. The 0.2s is configurable value and we can ship the feature with a good 
value for most situations after we test the feature. 


 Make the Distributed Split HDFS Location aware
 --

 Key: HBASE-6772
 URL: https://issues.apache.org/jira/browse/HBASE-6772
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Jeffrey Zhong

 During a hlog split, each log file (a single hdfs block) is allocated to a 
 different region server. This region server reads the file and creates the 
 recovery edit files.
 The allocation to the region server is random. We could take into account the 
 locations of the log file to split:
 - the reads would be local, hence faster. This allows short circuit as well.
 - less network i/o used during a failure (and this is important)
 - we would be sure to read from a working datanode, hence we're sure we won't 
 have read errors. Read errors slow the split process a lot, as we often enter 
 the timeouted world. 
 We need to limit the calls to the namenode however.
 Typical algo could be:
 - the master gets the locations of the hlog files
 - it writes it into ZK, if possible in one transaction (this way all the 
 tasks are visible alltogether, allowing some arbitrage by the region server).
 - when the regionserver receives the event, it checks for all logs and all 
 locations.
 - if there is a match, it takes it
 - if not it waits something like 0.2s (to give the time to other regionserver 
 to take it if the location matches), and take any remaining task.
 Drawbacks are:
 - a 0.2s delay added if there is no regionserver available on one of the 
 locations. It's likely possible to remove it with some extra synchronization.
 - Small increase in complexity and dependency to HDFS
 Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7842) Add compaction policy that explores more storefile groups

2013-03-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594969#comment-13594969
 ] 

Sergey Shelukhin commented on HBASE-7842:
-

{code}
compactionPolicy = new ExploringCompactionPolicy(this.conf, this.store/*as 
StoreConfigInfo*/);
{code}
HBASE-7935 will make this pluggable separately, I am planning to commit it 
after running tests (it has 2 +1s).
Regardless, it appears that this patch both replaces default policy and hijacks 
the default test, so the default ratio
algorithm becomes a poor bastard child, not used and not tested :) Should we 
delete it altogether and swap it with yours in place?

bq.  I've seen some really nasty compaction behavior lately with what's 
currently the default.
Can you add tests for those with new policy? Would also be useful to illustrate.

{code}
if (potentialMatchFiles.size()  bestSelection.size() ||
(potentialMatchFiles.size() == bestSelection.size()  size  
bestSize)) {
{code}
Interesting heuristic... it looks reasonable, but have you tried ratio proposed 
above? E.g. getting rid of 3 files of 5Mb each is better than getting rid of 4 
files of 500Mb each.

{code}
if (files.size() = 2) {
  return  true;
}
{code}
Why? What if files are 500 5?

{code}
  long sumAllOtherFilesize = 0;
  for (int j =0; j  files.size(); j++) {
if (i == j) continue;
sumAllOtherFilesize += files.get(j).getReader().length();
  }
{code}
Double nested loop is unnecessary. In fact, we already get total size outside; 
we might as well do it before this check, pass it in, and then we can just 
substract each file from it in a loop to get size of all other files. Shorter 
code too :)


{code}
minFiles = comConf.getMinFilesToCompact();
maxFiles = comConf.getMaxFilesToCompact();
minCompactionSize = comConf.getMinCompactSize();
ratio = comConf.getCompactionRatio();
offPeakRatio = comConf.getCompactionRatioOffPeak();
{code}
Nit: necessary? The whole point of the compaction config from FB patch was to 
make these easily accessible everywhere,
as far as I see.

{code}
ListStoreFile bestSelection = new ArrayListStoreFile(0);
{code}
Nit: not necessary.

{code}
for (int start = 0; start  candidates.size(); start++) {
{code}
Doesn't have to consider with less than minfiles to the end.

{code}
 for(int currentEnd = start; currentEnd  candidates.size(); currentEnd++) {
{code}
Can go from start plus minfiles, and check maxfiles too, then checks inside 
become unnecessary...

{code}
return  true;
for (int j =0;
singleFileSize   sumAllOtherFilesize
{code}
Etc.
Nit: spacing.
Nit^2: Many blank lines.

Nit: the main loop too could be done in a more optimal manner, probably doesn't 
matter.


 Add compaction policy that explores more storefile groups
 -

 Key: HBASE-7842
 URL: https://issues.apache.org/jira/browse/HBASE-7842
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-7842-0.patch, HBASE-7842-2.patch, 
 HBASE-7842-3.patch, HBASE-7842-4.patch


 Some workloads that are not as stable can have compactions that are too large 
 or too small using the current storefile selection algorithm.
 Currently:
 * Find the first file that Size(fi) = Sum(0, i-1, FileSize(fx))
 * Ensure that there are the min number of files (if there aren't then bail 
 out)
 * If there are too many files keep the larger ones.
 I would propose something like:
 * Find all sets of storefiles where every file satisfies 
 ** FileSize(fi) = Sum(0, i-1, FileSize(fx))
 ** Num files in set = max
 ** Num Files in set = min
 * Then pick the set of files that maximizes ((# storefiles in set) / 
 Sum(FileSize(fx)))
 The thinking is that the above algorithm is pretty easy reason about, all 
 files satisfy the ratio, and should rewrite the least amount of data to get 
 the biggest impact in seeks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francis Liu updated HBASE-8015:
---

Attachment: NamespaceDesign.pdf

Initial draft of design. This originally was intended to be implemented as 
coprocessors thus it's design was made to be as non-invasive as possible. 

[~enis] Suggested that it would be better to make this part of core. I'd be up 
for doing that and open to other changes to make things more integrated.

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: Bug
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594977#comment-13594977
 ] 

Enis Soztutar commented on HBASE-8015:
--

bq. Having it part of the table name makes the changes less invasive (changes 
in meta schema, HTable apis, etc). Though I agree it would be nice to make this
I think we should have namespaces as first class citizens. Namespaces have been 
traditionally used for grouping tables, setup replication, restricting access, 
etc. As a database, we can also use namespaces for acl, repication, backup, 
etc. 

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7935) make policy and compactor in default store engine separately pluggable (for things like tier-based, and default policy experiments with permutations)

2013-03-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13594995#comment-13594995
 ] 

Sergey Shelukhin commented on HBASE-7935:
-

Matteo also +1-d in /r, so I will commit in the afternoon if there are no 
objections.

 make policy and compactor in default store engine separately pluggable (for 
 things like tier-based, and default policy experiments with permutations)
 -

 Key: HBASE-7935
 URL: https://issues.apache.org/jira/browse/HBASE-7935
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HBASE-7935-v0.patch, HBASE-7935-v0-with-7843.patch, 
 HBASE-7935-v1.patch, HBASE-7935-v2.patch, HBASE-7935-v3.patch, 
 HBASE-7935-v3.patch


 Technically, StoreEngine can be used to achieve any permutations of things, 
 but to make it more convenient to replace compaction policy/compator in 
 standard schemes like tier-based, we can add separate hooks in 
 DefaultStoreEngine (as long as custom ones conform to its default 
 expectations e.g. flat list of sorted files, etc.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595005#comment-13595005
 ] 

Francis Liu commented on HBASE-8015:


{quote}
I think we should have namespaces as first class citizens. Namespaces have been 
traditionally used for grouping tables, setup replication, restricting access, 
etc. As a database, we can also use namespaces for acl, repication, backup, 
etc. 
{quote}

They will be first class citizens. There will be a namespace table for 
namespace meta information. Also embedding namespace information in table name 
does not prevent this, we are just introducing the notion of fully-qualified 
table names (which we should introduce anyway). And store them in this form in 
meta/root.

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7403) Online Merge

2013-03-06 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595007#comment-13595007
 ] 

Lars Hofhansl commented on HBASE-7403:
--

Let's prove this out in trunk first. Then we can make separate decisions about 
other release.
0.90 and 0.92 are out of the question (imho), so might 0.94 (as this needs a 
lot of other plumbing - table locks, atomic META updates, etc)

 Online Merge
 

 Key: HBASE-7403
 URL: https://issues.apache.org/jira/browse/HBASE-7403
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.95.0, 0.94.6
Reporter: chunhui shen
Assignee: chunhui shen
Priority: Critical
 Fix For: 0.95.0

 Attachments: 7403-trunkv5.patch, 7403-trunkv6.patch, 7403v5.diff, 
 7403-v5.txt, 7403v5.txt, hbase-7403-94v1.patch, hbase-7403-trunkv10.patch, 
 hbase-7403-trunkv11.patch, hbase-7403-trunkv12.patch, 
 hbase-7403-trunkv13.patch, hbase-7403-trunkv14.patch, 
 hbase-7403-trunkv15.patch, hbase-7403-trunkv16.patch, 
 hbase-7403-trunkv1.patch, hbase-7403-trunkv5.patch, hbase-7403-trunkv6.patch, 
 hbase-7403-trunkv7.patch, hbase-7403-trunkv8.patch, hbase-7403-trunkv9.patch, 
 merge region.pdf


 The feature of this online merge:
 1.Online,no necessary to disable table
 2.Less change for current code, could applied in trunk,0.94 or 0.92,0.90
 3.Easy to call merege request, no need to input a long region name, only 
 encoded name enough
 4.No limit when operation, you don't need to tabke care the events like 
 Server Dead, Balance, Split, Disabing/Enabing table, no need to take care 
 whether you send a wrong merge request, it has alread done for you
 5.Only little offline time for two merging regions
 Usage:
 1.Tool:  
 bin/hbase org.apache.hadoop.hbase.util.OnlineMerge [-force] [-async] [-show] 
 table-name region-encodedname-1 region-encodedname-2
 2.API: static void MergeManager#createMergeRequest
 We need merge in the following cases:
 1.Region hole or region overlap, can’t be fix by hbck
 2.Region become empty because of TTL and not reasonable Rowkey design
 3.Region is always empty or very small because of presplit when create table
 4.Too many empty or small regions would reduce the system performance(e.g. 
 mslab)
 Current merge tools only support offline and are not able to redo if 
 exception is thrown in the process of merging, causing a dirty data
 For online system, we need a online merge.
 This implement logic of this patch for  Online Merge is :
 For example, merge regionA and regionB into regionC
 1.Offline the two regions A and B
 2.Merge the two regions in the HDFS(Create regionC’s directory, move 
 regionA’s and regionB’s file to regionC’s directory, delete regionA’s and 
 regionB’s directory)
 3.Add the merged regionC to .META.
 4.Assign the merged regionC
 As design of this patch , once we do the merge work in the HDFS,we could redo 
 it until successful if it throws exception or abort or server restart, but 
 couldn’t be rolled back. 
 It depends on
 Use zookeeper to record the transaction journal state, make redo easier
 Use zookeeper to send/receive merge request
 Merge transaction is executed on the master
 Support calling merge request through API or shell tool
 About the merge process, please see the attachment and patch

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595009#comment-13595009
 ] 

Francis Liu commented on HBASE-8015:


Given that '.' is already used. Should we pick another delimiter for 
namespaces? Or should we provide a backward compatible way to support this. Ie 
creating namespaces for tablenames with '.'? 

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6772) Make the Distributed Split HDFS Location aware

2013-03-06 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595023#comment-13595023
 ] 

nkeywal commented on HBASE-6772:


The new design is better than my original proposition. I'm +1. Devaraj' comment 
is important as well imho, so we should put this info as well in ZK.
Just one point: the master should provide the full list of regionservers owning 
a copy. This way:
 - if one of the regionserver is actually dead it can be picked up by another 
one
 - it's possible to optimize the choice in the regionserver: if the RS sees 
it's the only one for a block it can pick it instead of another one that have 
more potential regionserver.
 - + the rack already mentioned by Devaraj.


 Make the Distributed Split HDFS Location aware
 --

 Key: HBASE-6772
 URL: https://issues.apache.org/jira/browse/HBASE-6772
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Jeffrey Zhong

 During a hlog split, each log file (a single hdfs block) is allocated to a 
 different region server. This region server reads the file and creates the 
 recovery edit files.
 The allocation to the region server is random. We could take into account the 
 locations of the log file to split:
 - the reads would be local, hence faster. This allows short circuit as well.
 - less network i/o used during a failure (and this is important)
 - we would be sure to read from a working datanode, hence we're sure we won't 
 have read errors. Read errors slow the split process a lot, as we often enter 
 the timeouted world. 
 We need to limit the calls to the namenode however.
 Typical algo could be:
 - the master gets the locations of the hlog files
 - it writes it into ZK, if possible in one transaction (this way all the 
 tasks are visible alltogether, allowing some arbitrage by the region server).
 - when the regionserver receives the event, it checks for all logs and all 
 locations.
 - if there is a match, it takes it
 - if not it waits something like 0.2s (to give the time to other regionserver 
 to take it if the location matches), and take any remaining task.
 Drawbacks are:
 - a 0.2s delay added if there is no regionserver available on one of the 
 locations. It's likely possible to remove it with some extra synchronization.
 - Small increase in complexity and dependency to HDFS
 Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients

2013-03-06 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595034#comment-13595034
 ] 

nkeywal commented on HBASE-7590:


It's on RB, waiting for reviews before being committed :-).

 Add a costless notifications mechanism from master to regionservers  clients
 -

 Key: HBASE-7590
 URL: https://issues.apache.org/jira/browse/HBASE-7590
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 7590.inprogress.patch, 7590.v1.patch, 
 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch


 t would be very useful to add a mechanism to distribute some information to 
 the clients and regionservers. Especially It would be useful to know globally 
 (regionservers + clients apps) that some regionservers are dead. This would 
 allow:
 - to lower the load on the system, without clients using staled information 
 and going on dead machines
 - to make the recovery faster from a client point of view. It's common to use 
 large timeouts on the client side, so the client may need a lot of time 
 before declaring a region server dead and trying another one. If the client 
 receives the information separatly about a region server states, it can take 
 the right decision, and continue/stop to wait accordingly.
 We can also send more information, for example instructions like 'slow down' 
 to instruct the client to increase the retries delay and so on.
  Technically, the master could send this information. To lower the load on 
 the system, we should:
 - have a multicast communication (i.e. the master does not have to connect to 
 all servers by tcp), with once packet every 10 seconds or so.
 - receivers should not depend on this: if the information is available great. 
 If not, it should not break anything.
 - it should be optional.
 So at the end we would have a thread in the master sending a protobuf message 
 about the dead servers on a multicast socket. If the socket is not 
 configured, it does not do anything. On the client side, when we receive an 
 information that a node is dead, we refresh the cache about it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7842) Add compaction policy that explores more storefile groups

2013-03-06 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595035#comment-13595035
 ] 

Elliott Clark commented on HBASE-7842:
--

bq.Can you add tests for those with new policy?
Not really it relies on Bulk loading, regular inserts,  and compactions to work 
together to make some strange orders of files and seq numbers.  Though one of 
the added tests ((251, 253, 251, maxSize -1)) should cover cover things alright.

bq.Why? What if files are 500 5?
yep off by one error. Should be  2 rather than = 2.

bq.Nit: not necessary.
Yes it is.  If the loop never finds something that passes the pre-conditions we 
need to have an empty list to return or there will be npe's.

 Add compaction policy that explores more storefile groups
 -

 Key: HBASE-7842
 URL: https://issues.apache.org/jira/browse/HBASE-7842
 Project: HBase
  Issue Type: New Feature
  Components: Compaction
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-7842-0.patch, HBASE-7842-2.patch, 
 HBASE-7842-3.patch, HBASE-7842-4.patch


 Some workloads that are not as stable can have compactions that are too large 
 or too small using the current storefile selection algorithm.
 Currently:
 * Find the first file that Size(fi) = Sum(0, i-1, FileSize(fx))
 * Ensure that there are the min number of files (if there aren't then bail 
 out)
 * If there are too many files keep the larger ones.
 I would propose something like:
 * Find all sets of storefiles where every file satisfies 
 ** FileSize(fi) = Sum(0, i-1, FileSize(fx))
 ** Num files in set = max
 ** Num Files in set = min
 * Then pick the set of files that maximizes ((# storefiles in set) / 
 Sum(FileSize(fx)))
 The thinking is that the above algorithm is pretty easy reason about, all 
 files satisfy the ratio, and should rewrite the least amount of data to get 
 the biggest impact in seeks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595045#comment-13595045
 ] 

Enis Soztutar commented on HBASE-7721:
--

These failed tests are unrelated. They succeed locally.

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8007) Adopt TestLoadAndVerify from BigTop

2013-03-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595046#comment-13595046
 ] 

Nick Dimiduk commented on HBASE-8007:
-

I guess I'm late to the party, I think the package name is wrong; should be 
{{org.apache.hadoop.hbase}}, and not in {{test}}.

 Adopt TestLoadAndVerify from BigTop
 ---

 Key: HBASE-8007
 URL: https://issues.apache.org/jira/browse/HBASE-8007
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0, 0.94.6

 Attachments: bigtop-differences.diff, hbase-8007_v1.patch, 
 hbase-8007_v2.patch


 We have found BigTop's TestLoadAndVerify quite useful for testing, and I 
 propose we adopt it in the HBase code base.
 https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hbase/src/main/groovy/org/apache/bigtop/itest/hbase/system/TestLoadAndVerify.java
 There was some discussions previously, of whether HBase or BigTop should host 
 system tests specific to HBase, and I believe for this specific one, it 
 belongs to the HBase code path. 
 We can maintain the code, and release it as a part of HBase, so that testing 
 it against secure deployments, Hadoop2, etc is easier (BIGTOP-853 fixes the 
 test to work with secure cluster)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7590) Add a costless notifications mechanism from master to regionservers clients

2013-03-06 Thread Devaraj Das (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595051#comment-13595051
 ] 

Devaraj Das commented on HBASE-7590:


FYI the RB link is https://reviews.apache.org/r/9731/ .. Am taking a look at 
the patch.

 Add a costless notifications mechanism from master to regionservers  clients
 -

 Key: HBASE-7590
 URL: https://issues.apache.org/jira/browse/HBASE-7590
 Project: HBase
  Issue Type: Bug
  Components: Client, master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 7590.inprogress.patch, 7590.v1.patch, 
 7590.v1-rebased.patch, 7590.v2.patch, 7590.v3.patch


 t would be very useful to add a mechanism to distribute some information to 
 the clients and regionservers. Especially It would be useful to know globally 
 (regionservers + clients apps) that some regionservers are dead. This would 
 allow:
 - to lower the load on the system, without clients using staled information 
 and going on dead machines
 - to make the recovery faster from a client point of view. It's common to use 
 large timeouts on the client side, so the client may need a lot of time 
 before declaring a region server dead and trying another one. If the client 
 receives the information separatly about a region server states, it can take 
 the right decision, and continue/stop to wait accordingly.
 We can also send more information, for example instructions like 'slow down' 
 to instruct the client to increase the retries delay and so on.
  Technically, the master could send this information. To lower the load on 
 the system, we should:
 - have a multicast communication (i.e. the master does not have to connect to 
 all servers by tcp), with once packet every 10 seconds or so.
 - receivers should not depend on this: if the information is available great. 
 If not, it should not break anything.
 - it should be optional.
 So at the end we would have a thread in the master sending a protobuf message 
 about the dead servers on a multicast socket. If the socket is not 
 configured, it does not do anything. On the client side, when we receive an 
 information that a node is dead, we refresh the cache about it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8007) Adopt TestLoadAndVerify from BigTop

2013-03-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595055#comment-13595055
 ] 

Enis Soztutar commented on HBASE-8007:
--

This is a pure test class. I think we should reserve the top level package for 
more important stuff. I know that some of our IT's are in the top package, 
which is my bad. We should move them out of the top-level package instead. 

 Adopt TestLoadAndVerify from BigTop
 ---

 Key: HBASE-8007
 URL: https://issues.apache.org/jira/browse/HBASE-8007
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.95.0, 0.98.0, 0.94.6
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0, 0.94.6

 Attachments: bigtop-differences.diff, hbase-8007_v1.patch, 
 hbase-8007_v2.patch


 We have found BigTop's TestLoadAndVerify quite useful for testing, and I 
 propose we adopt it in the HBase code base.
 https://github.com/apache/bigtop/blob/master/bigtop-tests/test-artifacts/hbase/src/main/groovy/org/apache/bigtop/itest/hbase/system/TestLoadAndVerify.java
 There was some discussions previously, of whether HBase or BigTop should host 
 system tests specific to HBase, and I believe for this specific one, it 
 belongs to the HBase code path. 
 We can maintain the code, and release it as a part of HBase, so that testing 
 it against secure deployments, Hadoop2, etc is easier (BIGTOP-853 fixes the 
 test to work with secure cluster)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8008) Fix DirFilter usage to be consistent

2013-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595058#comment-13595058
 ] 

Hudson commented on HBASE-8008:
---

Integrated in HBase-TRUNK #3919 (See 
[https://builds.apache.org/job/HBase-TRUNK/3919/])
HBASE-8008: Fix DirFilter usage to be consistent (Revision 1453465)

 Result = FAILURE
jyates : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/snapshot/SnapshotDescriptionUtils.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java


 Fix DirFilter usage to be consistent
 

 Key: HBASE-8008
 URL: https://issues.apache.org/jira/browse/HBASE-8008
 Project: HBase
  Issue Type: Sub-task
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-8008-r0.patch, hbase-8008-r1.patch


 Currently the DirFilter automatically filters out 
 HConstants.HBASE_NON_USER_TABLE_DIRS, which is not needed in most cases. We 
 should switch the usage so people actually using a directory filter and then 
 have a special filter when looking for tables specifically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8014) Backport HBASE-6915 to 0.94.

2013-03-06 Thread Jean-Marc Spaggiari (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595059#comment-13595059
 ] 

Jean-Marc Spaggiari commented on HBASE-8014:


Tests passed with 0.94.6RC0 + this patch with JDK 1.7.0_13.

 Backport HBASE-6915 to 0.94.
 

 Key: HBASE-8014
 URL: https://issues.apache.org/jira/browse/HBASE-8014
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
Priority: Critical
 Attachments: HBASE-8014-v0-0.94.patch


 JDK 1.7 changed some data size. Goal of this JIRA is to backport HBASE-6915 
 to 0.94

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7982) TestReplicationQueueFailover* runs for a minute, spews 3/4million lines complaining 'Filesystem closed', has an NPE, and still passes?

2013-03-06 Thread stack (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-7982:
-

Attachment: 7982v3.txt

Jeffrey's patch but have it use new utility in FSUtils for matching path 
component of a Path URI.

You ok w/ this addition Jeffrey?  Was afraid we'd get the unexpected just when 
we needed it most... in middle of a recovery...

 TestReplicationQueueFailover* runs for a minute, spews 3/4million lines 
 complaining 'Filesystem closed', has an NPE, and still passes?
 --

 Key: HBASE-7982
 URL: https://issues.apache.org/jira/browse/HBASE-7982
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: stack
Assignee: Jeffrey Zhong
Priority: Blocker
 Attachments: 7982v3.txt, hbase-7982-combined_1.patch, 
 hbase-7982-combined.patch, hbase-7982-huge-logging.patch, 
 hbase-7982-NPE_2.patch, hbase-7982-NPE.patch


 I was trying to look at why the odd time Hudson OOMEs trying to make a report 
 on 0.95 build #4 https://builds.apache.org/job/hbase-0.95/4/console:
 {code}
 ERROR: Failed to archive test reports
 hudson.util.IOException2: remote file operation failed: 
 /home/jenkins/jenkins-slave/workspace/hbase-0.95 at 
 hudson.remoting.Channel@151a4e3e:ubuntu3
   at hudson.FilePath.act(FilePath.java:861)
   at hudson.FilePath.act(FilePath.java:838)
   at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:87)
   at 
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:329)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
   at java.nio.charset.Charset.decode(Charset.java:791)
   at hudson.tasks.junit.SuiteResult.init(SuiteResult.java:215)
 ...
 {code}
 We are trying to allocate a big buffer and failing.
 Looking at reports being generated, we have quite a few that are  10MB in 
 size:
 {code}
 durruti:0.95 stack$ find hbase-* -type f -size +1k -exec ls -la {} \;
 -rw-r--r--@ 1 stack  staff  11126492 Feb 27 06:14 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.backup.TestHFileArchiving-output.txt
 -rw-r--r--@ 1 stack  staff  13296009 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
 -rw-r--r--@ 1 stack  staff  10541898 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMultiParallel-output.txt
 -rw-r--r--@ 1 stack  staff  25344601 Feb 27 05:51 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient-output.txt
 -rw-r--r--@ 1 stack  staff  17966969 Feb 27 06:12 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction-output.txt
 -rw-r--r--@ 1 stack  staff  17699068 Feb 27 06:09 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit-output.txt
 -rw-r--r--@ 1 stack  staff  17701832 Feb 27 06:07 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed-output.txt
 -rw-r--r--@ 1 stack  staff  717853709 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailover-output.txt
 -rw-r--r--@ 1 stack  staff  563616793 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
 {code}
 ... with TestReplicationQueueFailover* being order of magnitude bigger than 
 the others.
 Looking in the test I see both spewing between 800 and 900 thousand lines in 
 about a minute.  Here is their fixation:
 {code}
 8908998 2013-02-27 06:17:48,176 ERROR 
 [RegionServer:1;hemera.apache.org,35712,1361945801803.logSyncer] 
 wal.FSHLog$LogSyncer(1012): Error while syncing, requesting close of hlog.
 8908999 java.io.IOException: Filesystem closed
 8909000 ,...at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:319)
 8909001 ,...at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
 8909002 ,...at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3843)
 8909003 ,...at 
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
 8909004 ,...at 
 org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
 8909005 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:248)
 8909006 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1120)
 8909007 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1058)
 8909008 ,...at 
 

[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595075#comment-13595075
 ] 

Sergey Shelukhin commented on HBASE-8015:
-

The latter may not necessarily be backward compatible...

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7935) make policy and compactor in default store engine separately pluggable (for things like tier-based, and default policy experiments with permutations)

2013-03-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7935:


Attachment: HBASE-7935-addendum.patch

Noticed the comment on r w/ship it :) Here's the addendum. Only comment 
changes. I will commit together.

 make policy and compactor in default store engine separately pluggable (for 
 things like tier-based, and default policy experiments with permutations)
 -

 Key: HBASE-7935
 URL: https://issues.apache.org/jira/browse/HBASE-7935
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HBASE-7935-addendum.patch, HBASE-7935-v0.patch, 
 HBASE-7935-v0-with-7843.patch, HBASE-7935-v1.patch, HBASE-7935-v2.patch, 
 HBASE-7935-v3.patch, HBASE-7935-v3.patch


 Technically, StoreEngine can be used to achieve any permutations of things, 
 but to make it more convenient to replace compaction policy/compator in 
 standard schemes like tier-based, we can add separate hooks in 
 DefaultStoreEngine (as long as custom ones conform to its default 
 expectations e.g. flat list of sorted files, etc.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7935) make policy and compactor in default store engine separately pluggable (for things like tier-based, and default policy experiments with permutations)

2013-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595116#comment-13595116
 ] 

Hadoop QA commented on HBASE-7935:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12572392/HBASE-7935-addendum.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4698//console

This message is automatically generated.

 make policy and compactor in default store engine separately pluggable (for 
 things like tier-based, and default policy experiments with permutations)
 -

 Key: HBASE-7935
 URL: https://issues.apache.org/jira/browse/HBASE-7935
 Project: HBase
  Issue Type: Improvement
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HBASE-7935-addendum.patch, HBASE-7935-v0.patch, 
 HBASE-7935-v0-with-7843.patch, HBASE-7935-v1.patch, HBASE-7935-v2.patch, 
 HBASE-7935-v3.patch, HBASE-7935-v3.patch


 Technically, StoreEngine can be used to achieve any permutations of things, 
 but to make it more convenient to replace compaction policy/compator in 
 standard schemes like tier-based, we can add separate hooks in 
 DefaultStoreEngine (as long as custom ones conform to its default 
 expectations e.g. flat list of sorted files, etc.)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8011) Refactor ImportTsv

2013-03-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595128#comment-13595128
 ] 

Nick Dimiduk commented on HBASE-8011:
-

I'd appreciate their consolidation. I think the only difference between them 
would be the mapper implementation. How about a separate JIRA?

 Refactor ImportTsv
 --

 Key: HBASE-8011
 URL: https://issues.apache.org/jira/browse/HBASE-8011
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, Usability
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor

 ImportTsv is a little goofy.
  - It doesn't use the Tool,Configured interfaces like a mapreduce job should.
  - It has a static HBaseAdmin field that must be initialized before the 
 intended API of createSubmittableJob can be invoked.
  - TsvParser is critical to the default mapper implementation but is 
 unavailable to user custom mapper implementations without forcing them into 
 the o.a.h.h.mapreduce namespace.
  - The configuration key constants are not public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7958) Statistics per-column family per-region

2013-03-06 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595132#comment-13595132
 ] 

Jesse Yates commented on HBASE-7958:


So it looks like there is a desire for a pretty large range of possible 
statistics. I'd rather we don't get bogged down in what specific statistics we 
want, but push more towards a design discussion around enabling people to 
capture these statistics. We know we want them, the question is how :)

Once we have the mechanisms in place to read/write a stats table for an 
individual stat, we can much more easily expand that support stats at different 
tie-in places. The 'at compaction time histogram' seemed like an easy enough 
starting place for _one type of stat_, but that should not necessarily limit 
possible stats that can be collected; its an immediate use-case for a general 
statistics table.

Stepping back, it seems to me that we can have a basic set of statistics that 
you can enable for a table at creation time (or even turn it on later too). We 
then also need a mechanism to let people add their own statistics easily 
(thinking a CP hook here). From there, we just need to have an mechanism to 
make it easy to access each statistic.

I don't think any of the above proposals really changes my proposed 
outline-patch besides making it easy(easier?) to hook in custom stat 
implementations, a clean dynamic loading mechanism (from the various //TODOs 
for CP hooks), and a little more utility in the StatisticsTable class to make 
it easy to read a stat.

Sound reasonable?

 Statistics per-column family per-region
 ---

 Key: HBASE-7958
 URL: https://issues.apache.org/jira/browse/HBASE-7958
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.96.0
Reporter: Jesse Yates
Assignee: Jesse Yates
 Fix For: 0.96.0

 Attachments: hbase-7958_rough-cut-v0.patch


 Originating from this discussion on the dev list: 
 http://search-hadoop.com/m/coDKU1urovS/Simple+stastics+per+region/v=plain
 Essentially, we should have built-in statistics gathering for HBase tables. 
 This allows clients to have a better understanding of the distribution of 
 keys within a table and a given region. We could also surface this 
 information via the UI.
 There are a couple different proposals from the email, the overview is this:
 We add in something on compactions that gathers stats about the keys that are 
 written and then we surface them to a table.
 The possible proposals include:
 *How to implement it?*
 # Coprocessors - 
 ** advantage - it easily plugs in and people could pretty easily add their 
 own statistics. 
 ** disadvantage - UI elements would also require this, we get into dependent 
 loading, which leads down the OSGi path. Also, these CPs need to be installed 
 _after_ all the other CPs on compaction to ensure they see exactly what gets 
 written (doable, but a pain)
 # Built into HBase as a custom scanner
 ** advantage - always goes in the right place and no need to muck about with 
 loading CPs etc.
 ** disadvantage - less pluggable, at least for the initial cut
 *Where do we store data?*
 # .META.
 ** advantage - its an existing table, so we can jam it into another CF there
 ** disadvantage - this would make META much larger, possibly leading to 
 splits AND will make it much harder for other processes to read the info
 # A new stats table
 ** advantage - cleanly separates out the information from META
 ** disadvantage - should use a 'system table' idea to prevent accidental 
 deletion, manipulation by arbitrary clients, but still allow clients to read 
 it.
 Once we have this framework, we can then move to an actual implementation of 
 various statistics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595133#comment-13595133
 ] 

Enis Soztutar commented on HBASE-8015:
--

Hmm, ns.table is definitely more intuitive than /. SQL also uses that 
convention. 

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595152#comment-13595152
 ] 

Sergey Shelukhin commented on HBASE-8015:
-

my point about backward compat was about the storage, not display. I.e. if 
people want to create a table a.b without namespace and be confused while 
viewing some table list, I think it's ok. Maybe we can disallow tables with 
dots.
If they have existing table a.b and NS is stored separately, again the worst 
thing that happens is user gets confused, reads the release notes and renames 
it. Should be ok to.
But if we /store/ NS in table name, then with table a.b, without special 
handing, system gets confused (thinks it's in NS a) after upgrade :)

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8016) HBase as an embeddable library, but still using HDFS

2013-03-06 Thread eric baldeschwieler (JIRA)
eric baldeschwieler created HBASE-8016:
--

 Summary: HBase as an embeddable library, but still using HDFS
 Key: HBASE-8016
 URL: https://issues.apache.org/jira/browse/HBASE-8016
 Project: HBase
  Issue Type: Wish
Reporter: eric baldeschwieler


This goes in the strange idea bucket...  

I'm looking for a tool to allow folks to store key-value data into HDFS so that 
hadoop companion layers  apps don't need to rely either on external database 
or a NoSQL store.  HBase itself is often not running on such clusters and we 
can not add it as a requirement for many of the use cases I'm considering.

But...  what if we produced a library that provided the basic HBase API 
(creating tables  putting / getting values...) and this library was pointed at 
HDFS for durability.  This library would effectively embed a region server and 
the the master in a node and provide only API level access within that JVM.  We 
would skip marshaling  networking, gaining a fair amount of efficiency.  An 
application using this library would gain all of the advantages of HBase 
without adding any additional administrative complexity of managing HBase as a 
distributed service.

Thoughts?

Example use cases...  Right now a typical hadoop install runs serval services 
that use databases (Oozie, HCat, Hive ...).  What if some of these could be 
ported to use HDFS itself as their store with the HBase API provided to manage 
their data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-7721:
-

Attachment: hbase-7721_v3.patch

Fixed javadoc warning, and lines over 100. 

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-7721:
-

Status: Open  (was: Patch Available)

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-7721:
-

Status: Patch Available  (was: Open)

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595192#comment-13595192
 ] 

Hadoop QA commented on HBASE-7721:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572413/hbase-7721_v3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 21 new 
or modified tests.

{color:red}-1 patch{color}.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4699//console

This message is automatically generated.

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7982) TestReplicationQueueFailover* runs for a minute, spews 3/4million lines complaining 'Filesystem closed', has an NPE, and still passes?

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595197#comment-13595197
 ] 

Jeffrey Zhong commented on HBASE-7982:
--

[~saint@gmail.com] Two thumbs up for your changes. I like the new 
isMatchingPath method which basically removes all noises.

Thanks,
-Jeffrey 

 TestReplicationQueueFailover* runs for a minute, spews 3/4million lines 
 complaining 'Filesystem closed', has an NPE, and still passes?
 --

 Key: HBASE-7982
 URL: https://issues.apache.org/jira/browse/HBASE-7982
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: stack
Assignee: Jeffrey Zhong
Priority: Blocker
 Attachments: 7982v3.txt, hbase-7982-combined_1.patch, 
 hbase-7982-combined.patch, hbase-7982-huge-logging.patch, 
 hbase-7982-NPE_2.patch, hbase-7982-NPE.patch


 I was trying to look at why the odd time Hudson OOMEs trying to make a report 
 on 0.95 build #4 https://builds.apache.org/job/hbase-0.95/4/console:
 {code}
 ERROR: Failed to archive test reports
 hudson.util.IOException2: remote file operation failed: 
 /home/jenkins/jenkins-slave/workspace/hbase-0.95 at 
 hudson.remoting.Channel@151a4e3e:ubuntu3
   at hudson.FilePath.act(FilePath.java:861)
   at hudson.FilePath.act(FilePath.java:838)
   at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:87)
   at 
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:329)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
   at java.nio.charset.Charset.decode(Charset.java:791)
   at hudson.tasks.junit.SuiteResult.init(SuiteResult.java:215)
 ...
 {code}
 We are trying to allocate a big buffer and failing.
 Looking at reports being generated, we have quite a few that are  10MB in 
 size:
 {code}
 durruti:0.95 stack$ find hbase-* -type f -size +1k -exec ls -la {} \;
 -rw-r--r--@ 1 stack  staff  11126492 Feb 27 06:14 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.backup.TestHFileArchiving-output.txt
 -rw-r--r--@ 1 stack  staff  13296009 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
 -rw-r--r--@ 1 stack  staff  10541898 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMultiParallel-output.txt
 -rw-r--r--@ 1 stack  staff  25344601 Feb 27 05:51 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient-output.txt
 -rw-r--r--@ 1 stack  staff  17966969 Feb 27 06:12 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction-output.txt
 -rw-r--r--@ 1 stack  staff  17699068 Feb 27 06:09 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit-output.txt
 -rw-r--r--@ 1 stack  staff  17701832 Feb 27 06:07 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed-output.txt
 -rw-r--r--@ 1 stack  staff  717853709 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailover-output.txt
 -rw-r--r--@ 1 stack  staff  563616793 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
 {code}
 ... with TestReplicationQueueFailover* being order of magnitude bigger than 
 the others.
 Looking in the test I see both spewing between 800 and 900 thousand lines in 
 about a minute.  Here is their fixation:
 {code}
 8908998 2013-02-27 06:17:48,176 ERROR 
 [RegionServer:1;hemera.apache.org,35712,1361945801803.logSyncer] 
 wal.FSHLog$LogSyncer(1012): Error while syncing, requesting close of hlog.
 8908999 java.io.IOException: Filesystem closed
 8909000 ,...at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:319)
 8909001 ,...at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
 8909002 ,...at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3843)
 8909003 ,...at 
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
 8909004 ,...at 
 org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
 8909005 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:248)
 8909006 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1120)
 8909007 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1058)
 8909008 ,...at 
 

[jira] [Updated] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-7721:
-

Status: Open  (was: Patch Available)

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-7721:
-

Attachment: hbase-7721_v4.patch

Rebase 2. 

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch, hbase-7721_v4.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Enis Soztutar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar updated HBASE-7721:
-

Status: Patch Available  (was: Open)

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch, hbase-7721_v4.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8016) HBase as an embeddable library, but still using HDFS

2013-03-06 Thread Nick Dimiduk (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595207#comment-13595207
 ] 

Nick Dimiduk commented on HBASE-8016:
-

Like, HBase APIs for Derby or SQLite usage scenarios?

 HBase as an embeddable library, but still using HDFS
 

 Key: HBASE-8016
 URL: https://issues.apache.org/jira/browse/HBASE-8016
 Project: HBase
  Issue Type: Wish
Reporter: eric baldeschwieler

 This goes in the strange idea bucket...  
 I'm looking for a tool to allow folks to store key-value data into HDFS so 
 that hadoop companion layers  apps don't need to rely either on external 
 database or a NoSQL store.  HBase itself is often not running on such 
 clusters and we can not add it as a requirement for many of the use cases I'm 
 considering.
 But...  what if we produced a library that provided the basic HBase API 
 (creating tables  putting / getting values...) and this library was pointed 
 at HDFS for durability.  This library would effectively embed a region server 
 and the the master in a node and provide only API level access within that 
 JVM.  We would skip marshaling  networking, gaining a fair amount of 
 efficiency.  An application using this library would gain all of the 
 advantages of HBase without adding any additional administrative complexity 
 of managing HBase as a distributed service.
 Thoughts?
 Example use cases...  Right now a typical hadoop install runs serval services 
 that use databases (Oozie, HCat, Hive ...).  What if some of these could be 
 ported to use HDFS itself as their store with the HBase API provided to 
 manage their data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8011) Refactor ImportTsv

2013-03-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-8011:


Attachment: 0001-HBASE-8011-Refactor-ImportTsv.patch

Here's a start. I think TsvImporterMapper and TsvParser should be further 
separated from ImportTsv itself. This would be consistent with [~enis]'s 
request to merge with Import. I'm not sure how to handle validation in those 
cases though.

 Refactor ImportTsv
 --

 Key: HBASE-8011
 URL: https://issues.apache.org/jira/browse/HBASE-8011
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, Usability
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-8011-Refactor-ImportTsv.patch


 ImportTsv is a little goofy.
  - It doesn't use the Tool,Configured interfaces like a mapreduce job should.
  - It has a static HBaseAdmin field that must be initialized before the 
 intended API of createSubmittableJob can be invoked.
  - TsvParser is critical to the default mapper implementation but is 
 unavailable to user custom mapper implementations without forcing them into 
 the o.a.h.h.mapreduce namespace.
  - The configuration key constants are not public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8011) Refactor ImportTsv

2013-03-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-8011:


Status: Patch Available  (was: Open)

 Refactor ImportTsv
 --

 Key: HBASE-8011
 URL: https://issues.apache.org/jira/browse/HBASE-8011
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce, Usability
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-8011-Refactor-ImportTsv.patch


 ImportTsv is a little goofy.
  - It doesn't use the Tool,Configured interfaces like a mapreduce job should.
  - It has a static HBaseAdmin field that must be initialized before the 
 intended API of createSubmittableJob can be invoked.
  - TsvParser is critical to the default mapper implementation but is 
 unavailable to user custom mapper implementations without forcing them into 
 the o.a.h.h.mapreduce namespace.
  - The configuration key constants are not public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7996) Clean up resource leak in MultiTableInputFormat

2013-03-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7996:


Status: Open  (was: Patch Available)

 Clean up resource leak in MultiTableInputFormat
 ---

 Key: HBASE-7996
 URL: https://issues.apache.org/jira/browse/HBASE-7996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-7996-Avoid-leaking-table-handle.patch


 MultiTableInputFormatBase#getSplits() will always leak an open HTable 
 instance when {{throw new IOException(Expecting at least one region for 
 table ...)}} is called. It can potentially leak throughout that method body.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7996) Clean up resource leak in MultiTableInputFormat

2013-03-06 Thread Nick Dimiduk (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-7996:


Hadoop Flags: Reviewed
  Status: Patch Available  (was: Open)

 Clean up resource leak in MultiTableInputFormat
 ---

 Key: HBASE-7996
 URL: https://issues.apache.org/jira/browse/HBASE-7996
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nick Dimiduk
Assignee: Nick Dimiduk
Priority: Minor
 Attachments: 0001-HBASE-7996-Avoid-leaking-table-handle.patch


 MultiTableInputFormatBase#getSplits() will always leak an open HTable 
 instance when {{throw new IOException(Expecting at least one region for 
 table ...)}} is called. It can potentially leak throughout that method body.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595217#comment-13595217
 ] 

Francis Liu commented on HBASE-8015:


I see, I am assuming we have to store namespace as part of the region name. And 
store them fully-qualified on hdfs/zookeeper/etc. Else we would be forced to 
have all table names to be globally unique which would different from database 
semantics. 

Another concern is if a user can run the same application code against 0.96.

ie if I wanted to scan:

 scan 'foo.bar'

Pre-NS this would scan table 'foo.bar'. Post-NS, the system would parse this 
out as table bar in namespace foo. One way we could deal with this is read the 
table as Post-NS if it doesn't access read it as Pre-NS, check again if not 
then fail.


 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8015) Support for Namespaces

2013-03-06 Thread Francis Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595222#comment-13595222
 ] 

Francis Liu commented on HBASE-8015:


{quote}
The latter may not necessarily be backward compatible...
{quote}
Can you give an example to this? Thinking about it more, it seems to me you'll 
just end up with a lot of namespaces?

 Support for Namespaces
 --

 Key: HBASE-8015
 URL: https://issues.apache.org/jira/browse/HBASE-8015
 Project: HBase
  Issue Type: New Feature
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: NamespaceDesign.pdf




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8016) HBase as an embeddable library, but still using HDFS

2013-03-06 Thread eric baldeschwieler (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595225#comment-13595225
 ] 

eric baldeschwieler commented on HBASE-8016:


Yes, that is a good analogy.

 HBase as an embeddable library, but still using HDFS
 

 Key: HBASE-8016
 URL: https://issues.apache.org/jira/browse/HBASE-8016
 Project: HBase
  Issue Type: Wish
Reporter: eric baldeschwieler

 This goes in the strange idea bucket...  
 I'm looking for a tool to allow folks to store key-value data into HDFS so 
 that hadoop companion layers  apps don't need to rely either on external 
 database or a NoSQL store.  HBase itself is often not running on such 
 clusters and we can not add it as a requirement for many of the use cases I'm 
 considering.
 But...  what if we produced a library that provided the basic HBase API 
 (creating tables  putting / getting values...) and this library was pointed 
 at HDFS for durability.  This library would effectively embed a region server 
 and the the master in a node and provide only API level access within that 
 JVM.  We would skip marshaling  networking, gaining a fair amount of 
 efficiency.  An application using this library would gain all of the 
 advantages of HBase without adding any additional administrative complexity 
 of managing HBase as a distributed service.
 Thoughts?
 Example use cases...  Right now a typical hadoop install runs serval services 
 that use databases (Oozie, HCat, Hive ...).  What if some of these could be 
 ported to use HDFS itself as their store with the HBase API provided to 
 manage their data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7982) TestReplicationQueueFailover* runs for a minute, spews 3/4million lines complaining 'Filesystem closed', has an NPE, and still passes?

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595232#comment-13595232
 ] 

Jeffrey Zhong commented on HBASE-7982:
--

[~saint@gmail.com] Oh, actually the line 
{code} if (FSUtils.isMatchingPath(rootdir, logfile)) {code} doesn't check 
prefix. My origin intention is to check if logfile already contains rootdir or 
not. If not, just append the rootdir. I think we could use your isMatchPath to 
create a new function such as startWithPath(rootdir, logfile)

{code}
  public static boolean isStarignWithPath(final Path pathToSearch, final Path 
pathTail) {
Path tmpPath = new Path(pathTail.toString());
if (pathToSearch.depth()  tmpPath.depth()) return false;
while(tmpPath.depth()  pathToSearch.depth()){
  tmpPath = tmpPath.getParent();
}
return isMatchingPath(pathToSearch, tmpPath);
  }
{code}


 TestReplicationQueueFailover* runs for a minute, spews 3/4million lines 
 complaining 'Filesystem closed', has an NPE, and still passes?
 --

 Key: HBASE-7982
 URL: https://issues.apache.org/jira/browse/HBASE-7982
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: stack
Assignee: Jeffrey Zhong
Priority: Blocker
 Attachments: 7982v3.txt, hbase-7982-combined_1.patch, 
 hbase-7982-combined.patch, hbase-7982-huge-logging.patch, 
 hbase-7982-NPE_2.patch, hbase-7982-NPE.patch


 I was trying to look at why the odd time Hudson OOMEs trying to make a report 
 on 0.95 build #4 https://builds.apache.org/job/hbase-0.95/4/console:
 {code}
 ERROR: Failed to archive test reports
 hudson.util.IOException2: remote file operation failed: 
 /home/jenkins/jenkins-slave/workspace/hbase-0.95 at 
 hudson.remoting.Channel@151a4e3e:ubuntu3
   at hudson.FilePath.act(FilePath.java:861)
   at hudson.FilePath.act(FilePath.java:838)
   at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:87)
   at 
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:329)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
   at java.nio.charset.Charset.decode(Charset.java:791)
   at hudson.tasks.junit.SuiteResult.init(SuiteResult.java:215)
 ...
 {code}
 We are trying to allocate a big buffer and failing.
 Looking at reports being generated, we have quite a few that are  10MB in 
 size:
 {code}
 durruti:0.95 stack$ find hbase-* -type f -size +1k -exec ls -la {} \;
 -rw-r--r--@ 1 stack  staff  11126492 Feb 27 06:14 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.backup.TestHFileArchiving-output.txt
 -rw-r--r--@ 1 stack  staff  13296009 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
 -rw-r--r--@ 1 stack  staff  10541898 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMultiParallel-output.txt
 -rw-r--r--@ 1 stack  staff  25344601 Feb 27 05:51 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient-output.txt
 -rw-r--r--@ 1 stack  staff  17966969 Feb 27 06:12 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction-output.txt
 -rw-r--r--@ 1 stack  staff  17699068 Feb 27 06:09 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit-output.txt
 -rw-r--r--@ 1 stack  staff  17701832 Feb 27 06:07 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed-output.txt
 -rw-r--r--@ 1 stack  staff  717853709 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailover-output.txt
 -rw-r--r--@ 1 stack  staff  563616793 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
 {code}
 ... with TestReplicationQueueFailover* being order of magnitude bigger than 
 the others.
 Looking in the test I see both spewing between 800 and 900 thousand lines in 
 about a minute.  Here is their fixation:
 {code}
 8908998 2013-02-27 06:17:48,176 ERROR 
 [RegionServer:1;hemera.apache.org,35712,1361945801803.logSyncer] 
 wal.FSHLog$LogSyncer(1012): Error while syncing, requesting close of hlog.
 8908999 java.io.IOException: Filesystem closed
 8909000 ,...at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:319)
 8909001 ,...at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
 8909002 ,...at 
 

[jira] [Updated] (HBASE-7482) Port HBASE-7442 HBase remote CopyTable not working when security enabled to trunk

2013-03-06 Thread Gary Helmling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gary Helmling updated HBASE-7482:
-

Attachment: HBASE-7482-v2.patch

Passing the cluster ID through Configuration was originally a hack to avoid 
more invasive changes to ClientCache and the RPC engine layers that would have 
been required to directly represent cluster ID.

Now that we've already undertaking the removal of ClientCache and refactoring 
of RPC engines in 0.95/trunk, I think we're better off removing the 
Configuration pass-through completely.

Here's a modified patch that removes the mucking with Configuration and just 
passes through cluster ID as an honest-to-goodness parameter.  I think this 
winds up being cleaner.

James, let me know your thoughts or if you can picture situations this might 
break.

 Port HBASE-7442 HBase remote CopyTable not working when security enabled to 
 trunk
 -

 Key: HBASE-7482
 URL: https://issues.apache.org/jira/browse/HBASE-7482
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: James Kinley
Priority: Critical
 Fix For: 0.95.0

 Attachments: HBASE-7482-trunk.patch, HBASE-7482-v2.patch


 Excerpt about the choice of solution from :
 The first option was actually quite messy to implement. {{clusterId}} and 
 {{conf}} are fixed in *{{HBaseClient}}* when it's created and cached by 
 *{{SecureRpcEngine}}*, so to implement the fix here I would have had to pass 
 the different cluster {{confs}} up through *{{HConnectionManager}}* and 
 *{{HBaseRPC}}* in order to override the clusterId in 
 *{{SecureClient#SecureConnection}}*.
 I've gone with the second option of creating and caching different 
 *{{SecureClients}}* for the local and remote clusters in 
 *{{SecureRpcEngine}}* - keyed off of the {{clusterId}} instead of the default 
 *{{SocketFactory}}*. I think this is a cleaner solution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7982) TestReplicationQueueFailover* runs for a minute, spews 3/4million lines complaining 'Filesystem closed', has an NPE, and still passes?

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595238#comment-13595238
 ] 

Jeffrey Zhong commented on HBASE-7982:
--

fix some typo:
{code}
  public static boolean isStartingWithPath(final Path pathToSearch,final Path 
pathTail) {
Path tmpPath = pathTail;
if (pathToSearch.depth()  tmpPath.depth()) return false;
while(tmpPath.depth()  pathToSearch.depth()){
  tmpPath = tmpPath.getParent();
}
return isMatchingPath(pathToSearch, tmpPath);
  }
{code}

 TestReplicationQueueFailover* runs for a minute, spews 3/4million lines 
 complaining 'Filesystem closed', has an NPE, and still passes?
 --

 Key: HBASE-7982
 URL: https://issues.apache.org/jira/browse/HBASE-7982
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: stack
Assignee: Jeffrey Zhong
Priority: Blocker
 Attachments: 7982v3.txt, hbase-7982-combined_1.patch, 
 hbase-7982-combined.patch, hbase-7982-huge-logging.patch, 
 hbase-7982-NPE_2.patch, hbase-7982-NPE.patch


 I was trying to look at why the odd time Hudson OOMEs trying to make a report 
 on 0.95 build #4 https://builds.apache.org/job/hbase-0.95/4/console:
 {code}
 ERROR: Failed to archive test reports
 hudson.util.IOException2: remote file operation failed: 
 /home/jenkins/jenkins-slave/workspace/hbase-0.95 at 
 hudson.remoting.Channel@151a4e3e:ubuntu3
   at hudson.FilePath.act(FilePath.java:861)
   at hudson.FilePath.act(FilePath.java:838)
   at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:87)
   at 
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:329)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
   at java.nio.charset.Charset.decode(Charset.java:791)
   at hudson.tasks.junit.SuiteResult.init(SuiteResult.java:215)
 ...
 {code}
 We are trying to allocate a big buffer and failing.
 Looking at reports being generated, we have quite a few that are  10MB in 
 size:
 {code}
 durruti:0.95 stack$ find hbase-* -type f -size +1k -exec ls -la {} \;
 -rw-r--r--@ 1 stack  staff  11126492 Feb 27 06:14 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.backup.TestHFileArchiving-output.txt
 -rw-r--r--@ 1 stack  staff  13296009 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
 -rw-r--r--@ 1 stack  staff  10541898 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMultiParallel-output.txt
 -rw-r--r--@ 1 stack  staff  25344601 Feb 27 05:51 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient-output.txt
 -rw-r--r--@ 1 stack  staff  17966969 Feb 27 06:12 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction-output.txt
 -rw-r--r--@ 1 stack  staff  17699068 Feb 27 06:09 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit-output.txt
 -rw-r--r--@ 1 stack  staff  17701832 Feb 27 06:07 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed-output.txt
 -rw-r--r--@ 1 stack  staff  717853709 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailover-output.txt
 -rw-r--r--@ 1 stack  staff  563616793 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
 {code}
 ... with TestReplicationQueueFailover* being order of magnitude bigger than 
 the others.
 Looking in the test I see both spewing between 800 and 900 thousand lines in 
 about a minute.  Here is their fixation:
 {code}
 8908998 2013-02-27 06:17:48,176 ERROR 
 [RegionServer:1;hemera.apache.org,35712,1361945801803.logSyncer] 
 wal.FSHLog$LogSyncer(1012): Error while syncing, requesting close of hlog.
 8908999 java.io.IOException: Filesystem closed
 8909000 ,...at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:319)
 8909001 ,...at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
 8909002 ,...at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3843)
 8909003 ,...at 
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
 8909004 ,...at 
 org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
 8909005 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:248)
 8909006 ,...at 
 

[jira] [Commented] (HBASE-8006) use FSUtils to get/set hbase.rootdir

2013-03-06 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595239#comment-13595239
 ] 

Hudson commented on HBASE-8006:
---

Integrated in HBase-TRUNK #3920 (See 
[https://builds.apache.org/job/HBase-TRUNK/3920/])
HBASE-8006 use FSUtils to get/set hbase.rootdir (Revision 1453521)

 Result = FAILURE
mbertozzi : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogPrettyPrinter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLogSplitter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/FSUtils.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/RegionSplitter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/hbck/HFileCorruptionChecker.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/coprocessor/TestWALObserver.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/fs/TestBlockReorder.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestCatalogJanitor.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterNoCluster.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/master/cleaner/TestHFileLinkCleaner.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLog.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestHLogSplit.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/snapshot/TestFlushSnapshotFromClient.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/TestMergeTool.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRebuildTestCore.java


 use FSUtils to get/set hbase.rootdir
 

 Key: HBASE-8006
 URL: https://issues.apache.org/jira/browse/HBASE-8006
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
  Labels: noob
 Attachments: HBASE-8006-v0.patch


 We have different ways around the code to get the root dir
 {code}
 this.conf.get(hbase.rootdir)
 new Path(conf.get(HConstants.HBASE_DIR));
 fs.makeQualified(new Path(this.c.get(HConstants.HBASE_DIR)));
 FSUtils.getRootDir(conf)
 {code}
 also we have lots of places where we have this set fs.default
 {code}
 this.conf.set(fs.default.name, fs.getUri().toString());
 this.conf.set(fs.defaultFS, fs.getUri().toString());
 {code}
 replace everything to use the FSUtils and have one single way to do this stuff

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

2013-03-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7055:


Attachment: HBASE-7055-v7.patch

Rebased the patch in light of all the recent changes. Now it actually only has 
the changes pertaining to tier-based compaction

 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
 --

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.0

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
 HBASE-6371-v3-refactor-only-squashed.patch, 
 HBASE-6371-v4-refactor-only-squashed.patch, 
 HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
 HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, 
 HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, 
 HBASE-7055-v7.patch


 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

2013-03-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7055:


Status: Patch Available  (was: Open)

 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
 --

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.0

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
 HBASE-6371-v3-refactor-only-squashed.patch, 
 HBASE-6371-v4-refactor-only-squashed.patch, 
 HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
 HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, 
 HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, 
 HBASE-7055-v7.patch


 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7055) port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)

2013-03-06 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595251#comment-13595251
 ] 

Sergey Shelukhin commented on HBASE-7055:
-

https://reviews.apache.org/r/8460/ is the r

 port HBASE-6371 tier-based compaction from 0.89-fb to trunk (with changes)
 --

 Key: HBASE-7055
 URL: https://issues.apache.org/jira/browse/HBASE-7055
 Project: HBase
  Issue Type: Task
  Components: Compaction
Affects Versions: 0.96.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Fix For: 0.95.0

 Attachments: HBASE-6371-squashed.patch, HBASE-6371-v2-squashed.patch, 
 HBASE-6371-v3-refactor-only-squashed.patch, 
 HBASE-6371-v4-refactor-only-squashed.patch, 
 HBASE-6371-v5-refactor-only-squashed.patch, HBASE-7055-v0.patch, 
 HBASE-7055-v1.patch, HBASE-7055-v2.patch, HBASE-7055-v3.patch, 
 HBASE-7055-v4.patch, HBASE-7055-v5.patch, HBASE-7055-v6.patch, 
 HBASE-7055-v7.patch


 See HBASE-6371 for details.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6388) Avoid potential data loss if the flush fails during regionserver shutdown

2013-03-06 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-6388.


Resolution: Fixed

 Avoid potential data loss if the flush fails during regionserver shutdown
 -

 Key: HBASE-6388
 URL: https://issues.apache.org/jira/browse/HBASE-6388
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Critical
 Fix For: 0.89-fb

 Attachments: 
 0001-HBASE-6388-89-fb-parallelize-close-and-avoid-deletin.patch


 During a controlled shutdown, Regionserver deletes HLogs even if 
 HRegion.close() fails. We should not be doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-5783) Faster HBase bulk loader

2013-03-06 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-5783.


Resolution: Fixed

 Faster HBase bulk loader
 

 Key: HBASE-5783
 URL: https://issues.apache.org/jira/browse/HBASE-5783
 Project: HBase
  Issue Type: New Feature
  Components: Client, IPC/RPC, Performance, regionserver
Reporter: Karthik Ranganathan
Assignee: Amitanand Aiyer

 We can get a 3x to 4x gain based on a prototype demonstrating this approach 
 in effect (hackily) over the MR bulk loader for very large data sets by doing 
 the following:
 1. Do direct multi-puts from HBase client using GZIP compressed RPC's
 2. Turn off WAL (we will ensure no data loss in another way)
 3. For each bulk load client, we need to:
 3.1 do a put
 3.2 get back a tracking cookie (memstoreTs or HLogSequenceId) per put
 3.3 be able to ask the RS if the tracking cookie has been flushed to disk
 4. For each client, we can succeed it if the tracking cookie for the last put 
 it did (for every RS) makes it to disk. Otherwise the map task fails and is 
 retried.
 5. If the last put did not make it to disk for a timeout (say a second or so) 
 we issue a manual flush.
 Enhancements:
 - Increase the memstore size so that we flush larger files
 - Decrease the compaction ratios (say increase the number of files to compact)
 Quick background:
 The bottlenecks in the multiput approach are that the data is transferred 
 *uncompressed* twice over the top-of-rack: once from the client to the RS (on 
 the multi put call) and again because of WAL (HDFS replication). We reduced 
 the former with RPC compression and eliminated the latter above while still 
 guaranteeing that data wont be lost.
 This is better than the MR bulk loader at a high level because we dont need 
 to merge sort all the files for a given region and then make it a HFile - 
 thats the equivalent of bulk loading AND majorcompacting in one shot. Also 
 there is much more disk involved in the MR method (sort/spill).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6605) [0.89-fb] Allow bulk loading to continue past failures.

2013-03-06 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-6605.


Resolution: Fixed

 [0.89-fb] Allow bulk loading to continue past failures.
 ---

 Key: HBASE-6605
 URL: https://issues.apache.org/jira/browse/HBASE-6605
 Project: HBase
  Issue Type: Improvement
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor

 Currently, bulk loading a set of files will error out on the first file that 
 it is unable to bulk load.
 We have seen internal cases where there are transient failures (dfs/file 
 corruption etc) that may cause a file to not be successfully bulk loaded. It 
 seems useful to have  a mode where the bulk load process tries all the input 
 files, even if there is a failure.
 Thus, at the end of the process all the files remaining in the original 
 directory are failed files.
 Adding a configuration option to enable this behavior. The default behavior 
 is still to error out on the first failure.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6451) Integrate getRegionServerWithRetries and getRegionServerWithoutRetries

2013-03-06 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-6451.


Resolution: Fixed

 Integrate getRegionServerWithRetries  and getRegionServerWithoutRetries
 ---

 Key: HBASE-6451
 URL: https://issues.apache.org/jira/browse/HBASE-6451
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.89-fb
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor

 Prakash has changed getRegionServerWithRetries to handle retries and
 failures better. This change tries to bring similar improvements to the 
 put
 path.
 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-6840) SplitLogManager should reassign tasks even on a clean RS shutdown.

2013-03-06 Thread Amitanand Aiyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Amitanand Aiyer resolved HBASE-6840.


Resolution: Fixed

 SplitLogManager should reassign tasks even on a clean RS shutdown.
 --

 Key: HBASE-6840
 URL: https://issues.apache.org/jira/browse/HBASE-6840
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89-fb, 0.92.1, 0.94.1
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Minor

 SplitLogManager does not reassign tasks if the regionserver does a clean
 shutdown. We should reassign the task even if there is a clean shutdown.
 This is a problem if the shutting down RS is the 3rd splitlog worker. Master
 just sits there in a loop waiting for the task to finish, as the timeout
 will not reassign the task any further.
 Tue, Sep 18, 7:41 PM · D578411#test-plan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7948) client doesn't need to refresh meta while the region is opening

2013-03-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7948:


Attachment: HBASE-7948-v3.patch

Removing grace period.

 client doesn't need to refresh meta while the region is opening
 ---

 Key: HBASE-7948
 URL: https://issues.apache.org/jira/browse/HBASE-7948
 Project: HBase
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HBASE-7948-v0.patch, HBASE-7948-v1.patch, 
 HBASE-7948-v1.patch, HBASE-7948-v2.patch, HBASE-7948-v3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7791) Compaction USER_PRIORITY is slightly broken

2013-03-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HBASE-7791:


Attachment: HBASE-7791-v1.patch

Updating patch according to feedback.

 Compaction USER_PRIORITY is slightly broken
 ---

 Key: HBASE-7791
 URL: https://issues.apache.org/jira/browse/HBASE-7791
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
Priority: Minor
 Attachments: HBASE-7791-v0.patch, HBASE-7791-v1.patch


 The code to get compaction priority is as such:
 {code}   public int getCompactPriority(int priority) {
  // If this is a user-requested compaction, leave this at the highest 
 priority
  if(priority == Store.PRIORITY_USER) {
return Store.PRIORITY_USER;
  } else {
return this.blockingStoreFileCount - this.storefiles.size();
  }
}
 {code}.
 PRIORITY_USER is 1.
 The priorities are compared as numbers in HRegion, so compactions of blocking 
 stores will override user priority (probably intended); also, if you have 
 blockingFiles minus one, your priority is suddenly PRIORITY_USER, which may 
 cause at least this:
 LOG.debug(Warning, compacting more than  + 
 comConf.getMaxFilesToCompact() +
  files because of a user-requested major compaction);
 as well as some misleading logging.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7721) Atomic multi-row mutations in META

2013-03-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595276#comment-13595276
 ] 

Hadoop QA commented on HBASE-7721:
--

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12572417/hbase-7721_v4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 21 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/4700//console

This message is automatically generated.

 Atomic multi-row mutations in META
 --

 Key: HBASE-7721
 URL: https://issues.apache.org/jira/browse/HBASE-7721
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, regionserver
Reporter: Enis Soztutar
Assignee: Enis Soztutar
 Fix For: 0.95.0, 0.98.0

 Attachments: hbase-7721_v1.patch, hbase-7721_v2.patch, 
 hbase-7721_v3.patch, hbase-7721_v4.patch


 Thanks to Lars' local transactions patch (HBASE-5229), we can entertain the 
 possibility of doing local transactions within META regions.
 We need this mainly for region splits and merges. Clients scan the META 
 concurrent to the split/merge operations, and to prevent the clients from 
 seeing overlapping region boundaries or holes in META, we just through hoops. 
 For more backgroun, see BlockingMetaScannerVisitor, HBASE-5986, and my 
 comments at https://reviews.apache.org/r/8716/. 
 Now, for the actual implementation options: 
  1. As outlined in http://hadoop-hbase.blogspot.com/2012_02_01_archive.html, 
- We have to implement a Custom RegionSplitPolicy for the META regions to 
 ensure that a table's regions are always co-located in the same META region. 
 Then we can add MultiRowMutationEndpoint as a system level coprocessor, and 
 use it for META operations. 
  2. Do smt like HBASE-7716, and expose local atomic multi-row operations as a 
 native API.
  3. Move META to zookeeper. Use zookeeper.multi.  
 Then we can change region split / merge logic to make use of atomic META 
 operations. 
 Thoughts, suggestions? 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8017) Upgrade hadoop 1 dependency to 1.1.2

2013-03-06 Thread Ted Yu (JIRA)
Ted Yu created HBASE-8017:
-

 Summary: Upgrade hadoop 1 dependency to 1.1.2
 Key: HBASE-8017
 URL: https://issues.apache.org/jira/browse/HBASE-8017
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: 8017.txt

Hadoop 1.1.2 has been released.
From Matt:

This release includes 24 bug fixes and backward-compatible enhancements,
compared to Hadoop 1.1.1.  Improvements include:

   - bug fixes in use of Kerberos security and SPNEGO
   - a couple potential deadlock situations
   - fixes for IBM JDK compatibility
   - several unit test failure cleanups
   - other useful improvements

For details, please see
http://hadoop.apache.org/docs/r1.1.2/releasenotes.html.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7982) TestReplicationQueueFailover* runs for a minute, spews 3/4million lines complaining 'Filesystem closed', has an NPE, and still passes?

2013-03-06 Thread Jeffrey Zhong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-7982:
-

Attachment: hbase-7982-with-Stack-changes-02.patch

[~saint@gmail.com] For your connivence, I combined all the changes together.

Thanks,
-Jeffrey

 TestReplicationQueueFailover* runs for a minute, spews 3/4million lines 
 complaining 'Filesystem closed', has an NPE, and still passes?
 --

 Key: HBASE-7982
 URL: https://issues.apache.org/jira/browse/HBASE-7982
 Project: HBase
  Issue Type: Bug
  Components: build
Reporter: stack
Assignee: Jeffrey Zhong
Priority: Blocker
 Attachments: 7982v3.txt, hbase-7982-combined_1.patch, 
 hbase-7982-combined.patch, hbase-7982-huge-logging.patch, 
 hbase-7982-NPE_2.patch, hbase-7982-NPE.patch, 
 hbase-7982-with-Stack-changes-02.patch


 I was trying to look at why the odd time Hudson OOMEs trying to make a report 
 on 0.95 build #4 https://builds.apache.org/job/hbase-0.95/4/console:
 {code}
 ERROR: Failed to archive test reports
 hudson.util.IOException2: remote file operation failed: 
 /home/jenkins/jenkins-slave/workspace/hbase-0.95 at 
 hudson.remoting.Channel@151a4e3e:ubuntu3
   at hudson.FilePath.act(FilePath.java:861)
   at hudson.FilePath.act(FilePath.java:838)
   at hudson.tasks.junit.JUnitParser.parse(JUnitParser.java:87)
   at 
 ...
 Caused by: java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapCharBuffer.init(HeapCharBuffer.java:57)
   at java.nio.CharBuffer.allocate(CharBuffer.java:329)
   at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
   at java.nio.charset.Charset.decode(Charset.java:791)
   at hudson.tasks.junit.SuiteResult.init(SuiteResult.java:215)
 ...
 {code}
 We are trying to allocate a big buffer and failing.
 Looking at reports being generated, we have quite a few that are  10MB in 
 size:
 {code}
 durruti:0.95 stack$ find hbase-* -type f -size +1k -exec ls -la {} \;
 -rw-r--r--@ 1 stack  staff  11126492 Feb 27 06:14 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.backup.TestHFileArchiving-output.txt
 -rw-r--r--@ 1 stack  staff  13296009 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestFromClientSide3-output.txt
 -rw-r--r--@ 1 stack  staff  10541898 Feb 27 05:47 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestMultiParallel-output.txt
 -rw-r--r--@ 1 stack  staff  25344601 Feb 27 05:51 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient-output.txt
 -rw-r--r--@ 1 stack  staff  17966969 Feb 27 06:12 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.TestEndToEndSplitTransaction-output.txt
 -rw-r--r--@ 1 stack  staff  17699068 Feb 27 06:09 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplit-output.txt
 -rw-r--r--@ 1 stack  staff  17701832 Feb 27 06:07 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.regionserver.wal.TestHLogSplitCompressed-output.txt
 -rw-r--r--@ 1 stack  staff  717853709 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailover-output.txt
 -rw-r--r--@ 1 stack  staff  563616793 Feb 27 06:17 
 hbase-server/target/surefire-reports/org.apache.hadoop.hbase.replication.TestReplicationQueueFailoverCompressed-output.txt
 {code}
 ... with TestReplicationQueueFailover* being order of magnitude bigger than 
 the others.
 Looking in the test I see both spewing between 800 and 900 thousand lines in 
 about a minute.  Here is their fixation:
 {code}
 8908998 2013-02-27 06:17:48,176 ERROR 
 [RegionServer:1;hemera.apache.org,35712,1361945801803.logSyncer] 
 wal.FSHLog$LogSyncer(1012): Error while syncing, requesting close of hlog.
 8908999 java.io.IOException: Filesystem closed
 8909000 ,...at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:319)
 8909001 ,...at org.apache.hadoop.hdfs.DFSClient.access$1200(DFSClient.java:78)
 8909002 ,...at 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:3843)
 8909003 ,...at 
 org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:97)
 8909004 ,...at 
 org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:999)
 8909005 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(SequenceFileLogWriter.java:248)
 8909006 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1120)
 8909007 ,...at 
 org.apache.hadoop.hbase.regionserver.wal.FSHLog.syncer(FSHLog.java:1058)
 8909008 ,...at 
 

[jira] [Updated] (HBASE-8017) Upgrade hadoop 1 dependency to 1.1.2

2013-03-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8017:
--

Attachment: 8017.txt

 Upgrade hadoop 1 dependency to 1.1.2
 

 Key: HBASE-8017
 URL: https://issues.apache.org/jira/browse/HBASE-8017
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
 Attachments: 8017.txt


 Hadoop 1.1.2 has been released.
 From Matt:
 This release includes 24 bug fixes and backward-compatible enhancements,
 compared to Hadoop 1.1.1.  Improvements include:
- bug fixes in use of Kerberos security and SPNEGO
- a couple potential deadlock situations
- fixes for IBM JDK compatibility
- several unit test failure cleanups
- other useful improvements
 For details, please see
 http://hadoop.apache.org/docs/r1.1.2/releasenotes.html.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8017) Upgrade hadoop 1 dependency to 1.1.2

2013-03-06 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-8017:
--

Fix Version/s: 0.98.0
 Assignee: Ted Yu
   Status: Patch Available  (was: Open)

 Upgrade hadoop 1 dependency to 1.1.2
 

 Key: HBASE-8017
 URL: https://issues.apache.org/jira/browse/HBASE-8017
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu
Assignee: Ted Yu
 Fix For: 0.98.0

 Attachments: 8017.txt


 Hadoop 1.1.2 has been released.
 From Matt:
 This release includes 24 bug fixes and backward-compatible enhancements,
 compared to Hadoop 1.1.1.  Improvements include:
- bug fixes in use of Kerberos security and SPNEGO
- a couple potential deadlock situations
- fixes for IBM JDK compatibility
- several unit test failure cleanups
- other useful improvements
 For details, please see
 http://hadoop.apache.org/docs/r1.1.2/releasenotes.html.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6772) Make the Distributed Split HDFS Location aware

2013-03-06 Thread Jeffrey Zhong (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13595290#comment-13595290
 ] 

Jeffrey Zhong commented on HBASE-6772:
--

{quote}
Just one point: the master should provide the full list of regionservers owning 
a copy
{quote}

A good point, we can store up to 3 region severs as preferred list when 
available local RSs to a WAL = 3 to avoid potentially storing too much 
information in ZK. So a RS can pick a WAL with higher priority if it's in the 
preferred list. 


 Make the Distributed Split HDFS Location aware
 --

 Key: HBASE-6772
 URL: https://issues.apache.org/jira/browse/HBASE-6772
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: Jeffrey Zhong

 During a hlog split, each log file (a single hdfs block) is allocated to a 
 different region server. This region server reads the file and creates the 
 recovery edit files.
 The allocation to the region server is random. We could take into account the 
 locations of the log file to split:
 - the reads would be local, hence faster. This allows short circuit as well.
 - less network i/o used during a failure (and this is important)
 - we would be sure to read from a working datanode, hence we're sure we won't 
 have read errors. Read errors slow the split process a lot, as we often enter 
 the timeouted world. 
 We need to limit the calls to the namenode however.
 Typical algo could be:
 - the master gets the locations of the hlog files
 - it writes it into ZK, if possible in one transaction (this way all the 
 tasks are visible alltogether, allowing some arbitrage by the region server).
 - when the regionserver receives the event, it checks for all logs and all 
 locations.
 - if there is a match, it takes it
 - if not it waits something like 0.2s (to give the time to other regionserver 
 to take it if the location matches), and take any remaining task.
 Drawbacks are:
 - a 0.2s delay added if there is no regionserver available on one of the 
 locations. It's likely possible to remove it with some extra synchronization.
 - Small increase in complexity and dependency to HDFS
 Considering the advantages, it's worth it imho.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-7967) implement compactor for stripe compactions

2013-03-06 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HBASE-7967:
---

Assignee: Sergey Shelukhin

 implement compactor for stripe compactions
 --

 Key: HBASE-7967
 URL: https://issues.apache.org/jira/browse/HBASE-7967
 Project: HBase
  Issue Type: Sub-task
  Components: Compaction
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin

 Compactor needs to be implemented. See details in parent and blocking jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8018) Add Flaky Testcase Dector tool into dev-tools

2013-03-06 Thread Jeffrey Zhong (JIRA)
Jeffrey Zhong created HBASE-8018:


 Summary: Add Flaky Testcase Dector tool into dev-tools
 Key: HBASE-8018
 URL: https://issues.apache.org/jira/browse/HBASE-8018
 Project: HBase
  Issue Type: Bug
  Components: util
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
 Fix For: 0.98.0


Hey,

Recently I'm working on some hbase test case failures and I think it's useful 
if we can see a report of all failed test cases from most recent runs. So we 
can easily see how flaky a test case is. I wrote a tool some time back and 
below are some reports against different branches from today's run. You can get 
the tools' source at https://github.com/jeffreyz88/jenkins-tools. If we can run 
it daily and send out an email then we can quickly notice what may break after 
recent check-ins. 

Notes: 
1)  873  874  875  876  877  878  879  880  881 are builds which has failed 
test cases in current or previous runs
2) 1 means PASSED, 0 means NOT RUN AT ALL, -1 means FAILED

HBase-0.95 (from last 10 runs configurable)

Failed Test Cases21   22   23   24   25   27

org.apache.hadoop.hbase.catalog.testmetamigrationconvertingtopb.org.apache.hadoop.hbase.catalog.testmetamigrationconvertingtopb
0000   -1   -1
org.apache.hadoop.hbase.coprocessor.example.testbulkdeleteprotocol.testbulkdeletecolumn
   -100000
org.apache.hadoop.hbase.coprocessor.example.testrowcountendpoint.org.apache.hadoop.hbase.coprocessor.example.testrowcountendpoint
   -100000
org.apache.hadoop.hbase.coprocessor.example.testzookeeperscanpolicyobserver.org.apache.hadoop.hbase.coprocessor.example.testzookeeperscanpolicyobserver
   -100000
org.apache.hadoop.hbase.master.testmasterfailover.testmasterfailoverwithmockedritondeadrs
1   -1   -1011
org.apache.hadoop.hbase.regionserver.testsplittransactiononcluster.testshouldthrowioexceptionifstorefilesizeisemptyandshouldsuccessfullyexecuterollback
11111   -1
org.apache.hadoop.hbase.regionserver.testsplittransactiononcluster.testshutdownfixupwhendaughterhassplit
111   -1   -1   -1
org.apache.hadoop.hbase.regionserver.wal.testhlog.testlogcleaning01
11   -10
org.apache.hadoop.hbase.replication.testmasterreplication.testcyclicreplication 
   11111   -1
org.apache.hadoop.hbase.replication.testreplicationqueuefailover.queuefailover  
  1   -101   -10
org.apache.hadoop.hbase.replication.testreplicationqueuefailovercompressed.queuefailover
01   -10   -10
org.apache.hadoop.hbase.security.access.testaccesscontroller.org.apache.hadoop.hbase.security.access.testaccesscontroller
0   -10000

As you can see we have a few test cases don't run successfully at all or 
recently.

HBase-TRUNK (from last 10 runs)

Failed Test Cases  3908 3909 3910 3912 3913 3914 3915 3916

org.apache.hadoop.hbase.catalog.testmetamigrationconvertingtopb.org.apache.hadoop.hbase.catalog.testmetamigrationconvertingtopb
0   -10000   -1   -1
org.apache.hadoop.hbase.client.testadmin.testcloseregionwhenservernameisempty   
 111111   -10
org.apache.hadoop.hbase.client.testscannertimeout.test3686a11   -10 
   1111
org.apache.hadoop.hbase.client.testsnapshotcloneindependence.testofflinesnapshotregionoperationsindependent
01   -101111
org.apache.hadoop.hbase.client.testsnapshotcloneindependence.testonlinesnapshotregionoperationsindependent
11   -101111
org.apache.hadoop.hbase.master.testassignmentmanageroncluster.testmoveregion
11   -101111
org.apache.hadoop.hbase.master.testdistributedlogsplitting.testdelayeddeleteonfailure
11   -101111
org.apache.hadoop.hbase.master.testmasterfailover.testmasterfailoverwithmockedritondeadrs
11111   -101
org.apache.hadoop.hbase.master.testtablelockmanager.testdelete11   -1   
 01111
org.apache.hadoop.hbase.procedure.testzkprocedurecontrollers.testsimplezkcohortmembercontroller
111111   -10
org.apache.hadoop.hbase.regionserver.testsplittransactiononcluster.testexistingznodeblockssplitandwerollback
11111   -101
org.apache.hadoop.hbase.regionserver.testsplittransactiononcluster.testshouldthrowioexceptionifstorefilesizeisemptyandshouldsuccessfullyexecuterollback
11   -101111
org.apache.hadoop.hbase.regionserver.testsplittransactiononcluster.testshutdownfixupwhendaughterhassplit
   -1   -1   -10   -10   -10

  1   2   3   >