[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread ShiXing (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ShiXing updated HBASE-3725:
---

Attachment: HBASE-3725-0.92-V6.patch

toTed

bq. TestHRegion#testIncrementWithFlushAndDelete passed without that assignment.

Because the iscan is also read from memstore after I remove the code:
{code}
ListKeyValue fileResults = new ArrayListKeyValue();
- iscan.checkOnlyStoreFiles();
scanner = null;
try {
scanner = getScanner(iscan);
{code}

And there is no result in memstore, so increment will treat it as 0, it has the 
same effect as delete.

I add this case in TestHRegion#testIncrementWithFlushAndDelete in V6.

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: Jonathan Gray
 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r 

[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418968#comment-13418968
 ] 

Hadoop QA commented on HBASE-6411:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537268/HBASE-6411-0.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 18 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 16 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
  
org.apache.hadoop.hbase.security.access.TestZKPermissionsWatcher

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2418//console

This message is automatically generated.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)
binlijin created HBASE-6433:
---

 Summary: improve getRemoteAddress
 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6429:
--

Status: Patch Available  (was: Open)

 Filter with filterRow() returning true is also incompatible with scan with 
 limit
 

 Key: HBASE-6429
 URL: https://issues.apache.org/jira/browse/HBASE-6429
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch


 Currently if we scan with bot limit and a Filter with 
 filterRow(ListKeyValue) implemented, an  IncompatibleFilterException will 
 be thrown. The same exception should also be thrown if the filer has its 
 filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419046#comment-13419046
 ] 

Zhihong Ted Yu commented on HBASE-3725:
---

How about renaming leftResults as remainingResults ?

Please prepare patch for trunk. 

Thanks

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: Jonathan Gray
 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1 each 
 time
*
* Should result in a value of 10 but it doesn't, it results in 

[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6433:


Attachment: HBASE-6433-trunk.patch

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor
 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6433:


Attachment: HBASE-6433-94.patch
HBASE-6433-90.patch
HBASE-6433-92.patch

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor
 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419049#comment-13419049
 ] 

Hadoop QA commented on HBASE-6429:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537300/hbase-6429-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestCheckTestClasses

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//console

This message is automatically generated.

 Filter with filterRow() returning true is also incompatible with scan with 
 limit
 

 Key: HBASE-6429
 URL: https://issues.apache.org/jira/browse/HBASE-6429
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch


 Currently if we scan with bot limit and a Filter with 
 filterRow(ListKeyValue) implemented, an  IncompatibleFilterException will 
 be thrown. The same exception should also be thrown if the filer has its 
 filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6433:


Description: Without this patch it costs 4000ns, with this patch it costs 
1600ns

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor
 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Status: Patch Available  (was: Open)

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor
 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6429:
--

Summary: Filter with filterRow() returning true is incompatible with scan 
with limit  (was: Filter with filterRow() returning true is also incompatible 
with scan with limit)

 Filter with filterRow() returning true is incompatible with scan with limit
 ---

 Key: HBASE-6429
 URL: https://issues.apache.org/jira/browse/HBASE-6429
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch


 Currently if we scan with bot limit and a Filter with 
 filterRow(ListKeyValue) implemented, an  IncompatibleFilterException will 
 be thrown. The same exception should also be thrown if the filer has its 
 filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419124#comment-13419124
 ] 

Zhihong Ted Yu commented on HBASE-6429:
---

TestFilterWithScanLimits.java and FilterWrapper.java need Apache license.

{code}
+if(null == filter) {
{code}
Space between if and (.

Why does TestFilterWithScanLimits have main() method ?
It should be classified as medium test.

 Filter with filterRow() returning true is incompatible with scan with limit
 ---

 Key: HBASE-6429
 URL: https://issues.apache.org/jira/browse/HBASE-6429
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch


 Currently if we scan with bot limit and a Filter with 
 filterRow(ListKeyValue) implemented, an  IncompatibleFilterException will 
 be thrown. The same exception should also be thrown if the filer has its 
 filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6429:
--

Status: Open  (was: Patch Available)

 Filter with filterRow() returning true is incompatible with scan with limit
 ---

 Key: HBASE-6429
 URL: https://issues.apache.org/jira/browse/HBASE-6429
 Project: HBase
  Issue Type: Bug
  Components: filters
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch


 Currently if we scan with bot limit and a Filter with 
 filterRow(ListKeyValue) implemented, an  IncompatibleFilterException will 
 be thrown. The same exception should also be thrown if the filer has its 
 filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419152#comment-13419152
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

The trunk patch contains reordering of imports which kind of distracts from the 
goal for this JIRA.

Otherwise patch looks good.

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-6433:
-

Assignee: binlijin

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419154#comment-13419154
 ] 

Hadoop QA commented on HBASE-6433:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537328/HBASE-6433-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//console

This message is automatically generated.

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
 HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Attachment: 6433-getRemoteAddress-trunk.txt

Simplified patch for trunk.

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419207#comment-13419207
 ] 

Zhihong Ted Yu commented on HBASE-5547:
---

Will integrate in 3 hours if there is no objection.

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419255#comment-13419255
 ] 

Zhihong Ted Yu commented on HBASE-3725:
---

In trunk, getLastIncrement() call has been replaced with:
{code}
  ListKeyValue results = get(get, false);
{code}

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: Jonathan Gray
 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1 each 
 time
*
* Should result in a value of 10 

[jira] [Assigned] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-3725:
-

Assignee: ShiXing  (was: Jonathan Gray)

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: ShiXing
 Fix For: 0.92.2

 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1 each 
 time
*
* Should result in a value of 10 but it doesn't, it results in a 
 values of 110
*
* @throws IOException
*/
   static void 

[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-3725:
--

Fix Version/s: 0.92.2

 HBase increments from old value after delete and write to disk
 --

 Key: HBASE-3725
 URL: https://issues.apache.org/jira/browse/HBASE-3725
 Project: HBase
  Issue Type: Bug
  Components: io, regionserver
Affects Versions: 0.90.1
Reporter: Nathaniel Cook
Assignee: ShiXing
 Fix For: 0.92.2

 Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
 HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
 HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
 HBASE-3725.patch


 Deleted row values are sometimes used for starting points on new increments.
 To reproduce:
 Create a row r. Set column x to some default value.
 Force hbase to write that value to the file system (such as restarting the 
 cluster).
 Delete the row.
 Call table.incrementColumnValue with some_value
 Get the row.
 The returned value in the column was incremented from the old value before 
 the row was deleted instead of being initialized to some_value.
 Code to reproduce:
 {code}
 import java.io.IOException;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.HColumnDescriptor;
 import org.apache.hadoop.hbase.HTableDescriptor;
 import org.apache.hadoop.hbase.client.Delete;
 import org.apache.hadoop.hbase.client.Get;
 import org.apache.hadoop.hbase.client.HBaseAdmin;
 import org.apache.hadoop.hbase.client.HTableInterface;
 import org.apache.hadoop.hbase.client.HTablePool;
 import org.apache.hadoop.hbase.client.Increment;
 import org.apache.hadoop.hbase.client.Result;
 import org.apache.hadoop.hbase.util.Bytes;
 public class HBaseTestIncrement
 {
   static String tableName  = testIncrement;
   static byte[] infoCF = Bytes.toBytes(info);
   static byte[] rowKey = Bytes.toBytes(test-rowKey);
   static byte[] newInc = Bytes.toBytes(new);
   static byte[] oldInc = Bytes.toBytes(old);
   /**
* This code reproduces a bug with increment column values in hbase
* Usage: First run part one by passing '1' as the first arg
*Then restart the hbase cluster so it writes everything to disk
*Run part two by passing '2' as the first arg
*
* This will result in the old deleted data being found and used for 
 the increment calls
*
* @param args
* @throws IOException
*/
   public static void main(String[] args) throws IOException
   {
   if(1.equals(args[0]))
   partOne();
   if(2.equals(args[0]))
   partTwo();
   if (both.equals(args[0]))
   {
   partOne();
   partTwo();
   }
   }
   /**
* Creates a table and increments a column value 10 times by 10 each 
 time.
* Results in a value of 100 for the column
*
* @throws IOException
*/
   static void partOne()throws IOException
   {
   Configuration conf = HBaseConfiguration.create();
   HBaseAdmin admin = new HBaseAdmin(conf);
   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
   tableDesc.addFamily(new HColumnDescriptor(infoCF));
   if(admin.tableExists(tableName))
   {
   admin.disableTable(tableName);
   admin.deleteTable(tableName);
   }
   admin.createTable(tableDesc);
   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
   //Increment unitialized column
   for (int j = 0; j  10; j++)
   {
   table.incrementColumnValue(rowKey, infoCF, oldInc, 
 (long)10);
   Increment inc = new Increment(rowKey);
   inc.addColumn(infoCF, newInc, (long)10);
   table.increment(inc);
   }
   Get get = new Get(rowKey);
   Result r = table.get(get);
   System.out.println(initial values: new  + 
 Bytes.toLong(r.getValue(infoCF, newInc)) +  old  + 
 Bytes.toLong(r.getValue(infoCF, oldInc)));
   }
   /**
* First deletes the data then increments the column 10 times by 1 each 
 time
*
* Should result in a value of 10 but it doesn't, it results in a 
 values of 110
*
* @throws IOException
*/
   static void partTwo()throws 

[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419263#comment-13419263
 ] 

Andrew Purtell commented on HBASE-6432:
---

Seems reasonable and low risk to pull the ID from ZooKeeper.

 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.96.0

 Attachments: HBASE-6432_94.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-6432:
--

Affects Version/s: 0.96.0
Fix Version/s: (was: 0.96.0)

 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: HBASE-6432_94.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419271#comment-13419271
 ] 

Andrew Purtell edited comment on HBASE-6432 at 7/20/12 3:44 PM:


However, the master is responsible for publishing the cluster ID to ZooKeeper. 
If on a fresh install the regionservers are started first, then they won't find 
the ID up in ZK until the master comes up. I think this should be a Chore that 
retries until the ID is found then exits.

  was (Author: apurtell):
However, the master is responsible for publishing the cluster ID to 
ZooKeeper. If on a fresh install the regionservers are started first, then they 
won't find the ID up in ZK. I think this should be a Chore that retries until 
the ID is found then exits.
  
 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: HBASE-6432_94.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419271#comment-13419271
 ] 

Andrew Purtell commented on HBASE-6432:
---

However, the master is responsible for publishing the cluster ID to ZooKeeper. 
If on a fresh install the regionservers are started first, then they won't find 
the ID up in ZK. I think this should be a Chore that retries until the ID is 
found then exits.

 HRegionServer doesn't properly set clusterId in conf
 

 Key: HBASE-6432
 URL: https://issues.apache.org/jira/browse/HBASE-6432
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.0, 0.96.0
Reporter: Francis Liu
Assignee: Francis Liu
 Attachments: HBASE-6432_94.patch


 ClusterId is normally set into the passed conf during instantiation of an 
 HTable class. In the case of a HRegionServer this is bypassed and set to 
 default since getMaster() since it uses HBaseRPC to create the proxy 
 directly and bypasses the class which retrieves and sets the correct 
 clusterId. 
 This becomes a problem with clients (ie within a coprocessor) using 
 delegation tokens for authentication. Since the token's service will be the 
 correct clusterId and while the TokenSelector is looking for one with service 
 default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419279#comment-13419279
 ] 

Andrew Purtell commented on HBASE-6428:
---

bq. For example one could envision storing old versions of a KV separate 
HFiles, which then rarely have to be touched/cached by queries querying for new 
data. In addition these date ranged HFile can be easily used for backups while 
maintaining historical data.

I'd be curious if you think the Coprocessor API for compactions cannot be 
reworked to handle this.

 Pluggable Compaction policies
 -

 Key: HBASE-6428
 URL: https://issues.apache.org/jira/browse/HBASE-6428
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 For some usecases is useful to allow more control over how KVs get compacted.
 For example one could envision storing old versions of a KV separate HFiles, 
 which then rarely have to be touched/cached by queries querying for new data.
 In addition these date ranged HFile can be easily used for backups while 
 maintaining historical data.
 This would be a major change, allowing compactions to provide multiple 
 targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419285#comment-13419285
 ] 

Hadoop QA commented on HBASE-6433:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537359/6433-getRemoteAddress-trunk.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//console

This message is automatically generated.

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6434) Document effect of slow compressors on the flush path and workaround in the online book

2012-07-20 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-6434:
-

 Summary: Document effect of slow compressors on the flush path and 
workaround in the online book
 Key: HBASE-6434
 URL: https://issues.apache.org/jira/browse/HBASE-6434
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Priority: Minor


In HBASE-6423 Karthik writes
bq. 1. flushing a memstore takes a while (GZIP compression)
[... and the memstore gate comes crashing down]

We once sidestepped this issue by specifying different compression options for 
flushes (LZO or none) and major compaction (BZIP2), disabling automatic major 
compaction, and managing major compaction from a shell based process that 
iterates over each region on disk and makes some application specific decisions.

I go back and forth on whether this is a hack or legitimate HBase ops given how 
things currently work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419316#comment-13419316
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

I ran the two tests listed above and they passed:
{code}
Running org.apache.hadoop.hbase.master.TestSplitLogManager
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.933 sec
Running org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 68.194 sec
{code}

Will integrate to trunk later today if there is no objection.

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)
nkeywal created HBASE-6435:
--

 Summary: Reading WAL files after a recovery leads to time lost in 
HDFS timeouts when using dead datanodes
 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal


HBase writes a Write-Ahead-Log to revover from hardware failure.
This log is written with 'append' on hdfs.
Through ZooKeeper, HBase gets informed usually in 30s that it should start the 
recovery process. 
This means reading the Write-Ahead-Log to replay the edits on the other servers.

In standards deployments, HBase process (regionserver) are deployed on the same 
box as the datanodes.

It means that when the box stops, we've actually lost one of the edits, as we 
lost both the regionserver and the datanode.

As HDFS marks a node as dead after ~10 minutes, it appears as available when we 
try to read the blocks to recover. As such, we are delaying the recovery 
process by 60 seconds as the read will usually fail with a socket timeout. If 
the file is still opened for writing, it adds an extra 20s + a risk of losing 
edits if we connect with ipc to the dead DN.


Possible solutions are:
- shorter dead datanodes detection by the NN. Requires a NN code change.
- better dead datanodes management in DFSClient. Requires a DFS code change.
- NN customisation to write the WAL files on another DN instead of the local 
one.
- reordering the blocks returned by the NN on the client side to put the blocks 
on the same DN as the dead RS at the end of the priority queue. Requires a DFS 
code change or a kind of workaround.

The solution retained is the last one. Compared to what was discussed on the 
mailing list, the proposed patch will not modify HDFS source code but adds a 
proxy. This for two reasons:
- Some HDFS functions managing block orders are static 
(MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require 
to implement partially the fix, change the DFS interface to make this function 
non static, or put the hook static. None of these solution is very clean. 
- Adding a proxy allows to put all the code in HBase, simplifying dependency 
management.

Nevertheless, it would be better to have this in HDFS. But this solution allows 
to target the last version only, and this could allow minimal interface changes 
such as non static methods.

Moreover, writing the blocks to the non local DN would be an even better 
solution long term.






--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419356#comment-13419356
 ] 

nkeywal commented on HBASE-6401:


@stack
bq. Does svn blame/git bisecting not turn up the issue that fixed this?
Will try.

 HBase may lose edits after a crash if used with HDFS 1.0.3 or older
 ---

 Key: HBASE-6401
 URL: https://issues.apache.org/jira/browse/HBASE-6401
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Priority: Critical
 Attachments: TestReadAppendWithDeadDN.java


 This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
 hdfs jira for this.
 Context: HBase Write Ahead Log features. This is using hdfs append. If the 
 node crashes, the file that was written is read by other processes to replay 
 the action.
 - So we have in hdfs one (dead) process writing with another process reading.
 - But, despite the call to syncFs, we don't always see the data when we have 
 a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
 ignores the ipc errors and set the length to 0.
 - So we may miss all the writes to the last block if we try to connect to the 
 dead DN.
 hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup
 hdfs branch-2 or trunk: we should not have the issue (but not tested)
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
 The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419358#comment-13419358
 ] 

nkeywal commented on HBASE-6401:


HBASE-6435 will lower the probability to get the issue but will not solve it 
totally.

 HBase may lose edits after a crash if used with HDFS 1.0.3 or older
 ---

 Key: HBASE-6401
 URL: https://issues.apache.org/jira/browse/HBASE-6401
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Priority: Critical
 Attachments: TestReadAppendWithDeadDN.java


 This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
 hdfs jira for this.
 Context: HBase Write Ahead Log features. This is using hdfs append. If the 
 node crashes, the file that was written is read by other processes to replay 
 the action.
 - So we have in hdfs one (dead) process writing with another process reading.
 - But, despite the call to syncFs, we don't always see the data when we have 
 a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
 ignores the ipc errors and set the length to 0.
 - So we may miss all the writes to the last block if we try to connect to the 
 dead DN.
 hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup
 hdfs branch-2 or trunk: we should not have the issue (but not tested)
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
 The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419360#comment-13419360
 ] 

Jean-Daniel Cryans commented on HBASE-6433:
---

Can we have a meaningful jira title with a meaningful description of the 
problem plus how it's being addressed?

 improve getRemoteAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Alex Baranau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau updated HBASE-6411:


Attachment: HBASE-6411-1.patch

Adjusted Elliott's patch.

Added example unit-test for MasterMetrics that verifies metrics value change. 
Had to create MetricsAsserts shim in test sources in compat modules.

Please let me know what you think.

Will try to extract maps in BaseMetricsSourceImpl(s) into separate class and 
add support for MetricTags. I guess we agreed on that previously.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Description: 
Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
call.connection.socket.getInetAddress().

The host address is actually stored in HBaseServer.Connection.hostAddress 
field. We don't need to go through Socket to get this information.

Without this patch it costs 4000ns, with this patch it costs 1600ns

  was:Without this patch it costs 4000ns, with this patch it costs 1600ns

Summary: Improve HBaseServer#getRemoteAddress by utilizing 
HBaseServer.Connection.hostAddress  (was: improve getRemoteAddress)

@J-D:
See if updated description suffices.

 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419398#comment-13419398
 ] 

nkeywal commented on HBASE-6401:


@stack The oldest version of DFSInputStream.java (its split from DFSClient) is 
one year old and seems ok on trunk.

 HBase may lose edits after a crash if used with HDFS 1.0.3 or older
 ---

 Key: HBASE-6401
 URL: https://issues.apache.org/jira/browse/HBASE-6401
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Priority: Critical
 Attachments: TestReadAppendWithDeadDN.java


 This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
 hdfs jira for this.
 Context: HBase Write Ahead Log features. This is using hdfs append. If the 
 node crashes, the file that was written is read by other processes to replay 
 the action.
 - So we have in hdfs one (dead) process writing with another process reading.
 - But, despite the call to syncFs, we don't always see the data when we have 
 a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
 ignores the ipc errors and set the length to 0.
 - So we may miss all the writes to the last block if we try to connect to the 
 dead DN.
 hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup
 hdfs branch-2 or trunk: we should not have the issue (but not tested)
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
 The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419399#comment-13419399
 ] 

Elliott Clark commented on HBASE-6411:
--

bq.Will try to extract maps in BaseMetricsSourceImpl(s) into separate class and 
add support for MetricTags. I guess we agreed on that previously.
Thanks.  That sounds great.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419408#comment-13419408
 ] 

Jean-Daniel Cryans commented on HBASE-6433:
---

Thanks you!

 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6436) Netty should be moved off of snapshots.

2012-07-20 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-6436:


 Summary: Netty should be moved off of snapshots.
 Key: HBASE-6436
 URL: https://issues.apache.org/jira/browse/HBASE-6436
 Project: HBase
  Issue Type: Task
Reporter: Elliott Clark
Assignee: Elliott Clark


Netty is currently at 3.5.0.final-SNAPSHOT the final 3.5.0.Final should be used 
when possible so that repositories aren't queried when not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Attachment: 6435.unfinished.patch

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419411#comment-13419411
 ] 

Hadoop QA commented on HBASE-6411:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537375/HBASE-6411-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 39 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause mvn compile goal to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide
  org.apache.hadoop.hbase.master.TestAssignmentManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2422//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2422//console

This message is automatically generated.

 Move Master Metrics to metrics 2
 

 Key: HBASE-6411
 URL: https://issues.apache.org/jira/browse/HBASE-6411
 Project: HBase
  Issue Type: Sub-task
Reporter: Elliott Clark
Assignee: Elliott Clark
 Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
 HBASE-6411_concept.patch


 Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419414#comment-13419414
 ] 

nkeywal commented on HBASE-6435:


The patch is not finished. Actually, it contains for code for the hdfs hook and 
the related test, but not the code for defining the location order from the 
file name. But as it is different from what we initially discussed, I post it 
here in case someone sees something I missed.

It does not mean it should not be fixed in hdfs as well, just that this is 
likely to be much simpler than patching the 1.0 branch...

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419423#comment-13419423
 ] 

nkeywal commented on HBASE-6401:


HDFS-3222 is not exactly this one, but not far, and fixed on 2.0 as well.

 HBase may lose edits after a crash if used with HDFS 1.0.3 or older
 ---

 Key: HBASE-6401
 URL: https://issues.apache.org/jira/browse/HBASE-6401
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Priority: Critical
 Attachments: TestReadAppendWithDeadDN.java


 This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
 hdfs jira for this.
 Context: HBase Write Ahead Log features. This is using hdfs append. If the 
 node crashes, the file that was written is read by other processes to replay 
 the action.
 - So we have in hdfs one (dead) process writing with another process reading.
 - But, despite the call to syncFs, we don't always see the data when we have 
 a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
 ignores the ipc errors and set the length to 0.
 - So we may miss all the writes to the last block if we try to connect to the 
 dead DN.
 hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup
 hdfs branch-2 or trunk: we should not have the issue (but not tested)
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
 The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419446#comment-13419446
 ] 

Todd Lipcon commented on HBASE-6435:


I'm -1 on this kind of hack going into HBase before we add the feature to HDFS. 
I agree that adding to HDFS proper means we have to wait for a release, but 
this kind of code is likely to be really fragile. Also, without HBase driving 
requirements of HDFS, it will never evolve to natively have these kind of 
features, and HBase will devolve into a mess of reflection hacks to change 
around the HDFS internals.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419461#comment-13419461
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

Integrated to trunk.

Thanks for the patch, binlijin.

Thanks for the review, J-D.

 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-20 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419491#comment-13419491
 ] 

Aditya Kishore commented on HBASE-6389:
---

bq. BTW what do Ri, C and Fi represent in the formula above ?

'n' is the number of tables in the cluster, *R*~i~ is the number of regions and 
*CF*~i~ is the number of column families in table 'i' ^\[1\]^.

1. [MSLAB is ON by 
default|http://hbase.apache.org/book/upgrade0.92.html#d1952e2965]

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419529#comment-13419529
 ] 

stack commented on HBASE-6389:
--

@Lars Its up to you (But since you asked, fine by me ... I like what you are 
doing though Aditya... thanks for the help).

 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419535#comment-13419535
 ] 

stack commented on HBASE-6433:
--

@binlijin Why not change what getRemoteIp does internally (your patch copies 
much of the body of getRemoteIp).  Is it that getRemoteIp is used in places 
where Call has not had host address set yet?

 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419551#comment-13419551
 ] 

stack commented on HBASE-6435:
--

Yeah, we should do both (I'd think that whats added to HDFS is more general 
than just this workaround scheme where local gets moved to the end of the list; 
i.e. we add being able to intercept the order returned by the NN and let a 
client-side policy alter it based on local knowledge if wanted Could add 
other customizations like being able to set timeout per DFSInput/OutputStream 
as you've suggested up on dev list N).  Would be sweet if the 'hack' were 
available meantime while we wait on an hdfs release.

Looking at patch, looks like inventive hackery; good on you.

Do we have to do this in both master and regionserver?  Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)

+  HFileSystem.addLocationOrderHack(conf);

Rather than have it called a reorderProxy, call it an HBaseDFSClient?  Might 
want to add more customizations while waiting on HDFS fix to arrive.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419552#comment-13419552
 ] 

Hudson commented on HBASE-6433:
---

Integrated in HBase-TRUNK #3155 (See 
[https://builds.apache.org/job/HBase-TRUNK/3155/])
HBASE-6433 Improve HBaseServer#getRemoteAddress by utilizing 
HBaseServer.Connection.hostAddress (binlijin) (Revision 1363905)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java


 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419561#comment-13419561
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

Looking at the code, Connection has this member:
{code}
private InetAddress addr;
{code}
But I don't see where it is assigned. The following assignment is to a local 
variable:
{code}
  InetAddress addr = socket.getInetAddress();
{code}

 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419579#comment-13419579
 ] 

stack commented on HBASE-6401:
--

@Nkeywal We need another patch on top of hdfs-3222?

 HBase may lose edits after a crash if used with HDFS 1.0.3 or older
 ---

 Key: HBASE-6401
 URL: https://issues.apache.org/jira/browse/HBASE-6401
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Priority: Critical
 Attachments: TestReadAppendWithDeadDN.java


 This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
 hdfs jira for this.
 Context: HBase Write Ahead Log features. This is using hdfs append. If the 
 node crashes, the file that was written is read by other processes to replay 
 the action.
 - So we have in hdfs one (dead) process writing with another process reading.
 - But, despite the call to syncFs, we don't always see the data when we have 
 a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
 ignores the ipc errors and set the length to 0.
 - So we may miss all the writes to the last block if we try to connect to the 
 dead DN.
 hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup
 hdfs branch-2 or trunk: we should not have the issue (but not tested)
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
 The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419581#comment-13419581
 ] 

nkeywal commented on HBASE-6435:


My thinking was it could make it on a hdfs release that accepts changing public 
interfaces. I fully agree with you Todd, we need to do our homeworks and push 
hdfs to ensure that what we need is understood and makes it to a release. On 
the other hand, if I look at how it worked for much simpler stuff like JUnit 
and surefire, our changes are in theie trunk for a few months and we're still 
waiting. These things take time. But I will do my homeworks on hdfs, I promise 
(I may need your help actually). The Jira will be created next week and if I 
have enough feedback I will propose a patch.

I was also wondering if proposing natively to have interceptors would not be 
interesting for hdfs. It was available a long time in an orb called orbix and 
was great to use. But they would need to be per conf, so cannot be available 
with static stuff.

bq. Do we have to do this in both master and regionserver? Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)
It can be put pretty late, basically before we start a recovery process. But we 
don't want it client side, so I will check this.

bq. Rather than have it called a reorderProxy, call it an HBaseDFSClient? Might 
want to add more customizations while waiting on HDFS fix to arrive.
I've intercepted a lower level call: I'm between the DFSClient and the 
namenode. This because the DFSClient does more than just transferring calls: it 
contains some logic. Hence going in front of the namenode. But yes, I could 
make it more generic.


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419585#comment-13419585
 ] 

nkeywal commented on HBASE-6401:


On 1.x, yes, I think that backporting hdfs-3222 won't be enough. On 2.0, it 
seems it's ok, even if I can't find the good soul who fixed it. As I can't find 
a jira, I can create a new one  propose a fix specific to branch 1. 

 HBase may lose edits after a crash if used with HDFS 1.0.3 or older
 ---

 Key: HBASE-6401
 URL: https://issues.apache.org/jira/browse/HBASE-6401
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.0
 Environment: all
Reporter: nkeywal
Priority: Critical
 Attachments: TestReadAppendWithDeadDN.java


 This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
 hdfs jira for this.
 Context: HBase Write Ahead Log features. This is using hdfs append. If the 
 node crashes, the file that was written is read by other processes to replay 
 the action.
 - So we have in hdfs one (dead) process writing with another process reading.
 - But, despite the call to syncFs, we don't always see the data when we have 
 a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
 ignores the ipc errors and set the length to 0.
 - So we may miss all the writes to the last block if we try to connect to the 
 dead DN.
 hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853view=markup
 hdfs branch-2 or trunk: we should not have the issue (but not tested)
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
 The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419588#comment-13419588
 ] 

Todd Lipcon commented on HBASE-6435:


I think there's a good motivation to add these kind of APIs generally to 
DFSInputStream. In particular, I think something like the following:

public ListReplica getAvailableReplica(long pos); // return the list of 
available replicas at given file offset, in priority order
public void prioritizeReplica(Replica r); // move given replica to front of list
public void blacklistReplica(Replica r); // move replica to back of list
(or something of this sort)

The Replica API would then expose the datanode IDs (and after HDFS-3672, the 
disk ID).
So, in HBase we could simply open the file, enumerate the replicas, 
deprioritize the one on the suspected node, and move on with the normal code 
paths.

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419608#comment-13419608
 ] 

nkeywal commented on HBASE-6435:


I understand that you don't want to expose the internal nor something like the 
DatanodeInfo. The same type of API would be useful for the outputstream, 
putting priorities on nodes (and so reusing some knowledge for the dead nodes, 
or, for the wal, remove the local writes). It simple and efficient.

With the current DFSClient implementation, a callback would ease cases like 
opening a file already opened for writing, or when a node list is cleared when 
they all failed. But may be it can be changed as well.



 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419623#comment-13419623
 ] 

Todd Lipcon commented on HBASE-6435:


bq. With the current DFSClient implementation, a callback would ease cases like 
opening a file already opened for writing, or when a node list is cleared when 
they all failed. But may be it can be changed as well.

Can you explain further what you mean here? What would you use these callbacks 
for?

 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution is very 
 clean. 
 - Adding a proxy allows to put all the code in HBase, simplifying dependency 
 management.
 Nevertheless, it would be better to have this in HDFS. But this solution 
 allows to target the last version only, and this could allow minimal 
 interface changes such as non static methods.
 Moreover, writing the blocks to the non local DN would be an even better 
 solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6321:
--

Attachment: HBASE-6321-0.94.patch

Had a stab at this. What I figured is that the getting of UUIDs was done 
outside of {{ReplicationZookeeper}} so it was missing the functionalities from 
that class (you can also see the feature envy that was going on there).

I refactored the ugly UUID stuff in {{ReplicationSource.run}} into 
{{ReplicationZookeeper.getPeerUUID}}. There I needed to handle the session 
expiration issues so I refactored that from another method into 
{{reconnectPeer}}. Now that the issue is handled the possibility of a null UUID 
remained if the peer wasn't reachable so I added a loop in 
{{ReplicationSource}}.

Finally I saw that we were doing the UUID dance in {{ReplicationSource.init}} 
for the current cluster so I pushed that to 
{{ReplicationZookeeper.getUUIDForCluster}} and refactored {{getPeerUUID}} to 
use it.

The code should be clearer a more reliable.

 ReplicationSource dies reading the peer's id
 

 Key: HBASE-6321
 URL: https://issues.apache.org/jira/browse/HBASE-6321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.2

 Attachments: HBASE-6321-0.94.patch


 This is what I saw:
 {noformat}
 2012-07-01 05:04:01,638 ERROR 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
 source 8 because an error occurred: Could not read peer's cluster id
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired for /va1-backup/hbaseid
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
 at 
 org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
 {noformat}
 The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6321:
--

Fix Version/s: (was: 0.90.8)
 Assignee: Jean-Daniel Cryans

Assigning this to me and removing the 0.90 target since I found out that that 
part of the code was added in 0.92

 ReplicationSource dies reading the peer's id
 

 Key: HBASE-6321
 URL: https://issues.apache.org/jira/browse/HBASE-6321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.2

 Attachments: HBASE-6321-0.94.patch


 This is what I saw:
 {noformat}
 2012-07-01 05:04:01,638 ERROR 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
 source 8 because an error occurred: Could not read peer's cluster id
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired for /va1-backup/hbaseid
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
 at 
 org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
 {noformat}
 The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419633#comment-13419633
 ] 

Jean-Daniel Cryans commented on HBASE-6321:
---

Oh I forgot to mention that I ran TestReplication/Source/Manager 2 times each 
and they all passed.

 ReplicationSource dies reading the peer's id
 

 Key: HBASE-6321
 URL: https://issues.apache.org/jira/browse/HBASE-6321
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.2

 Attachments: HBASE-6321-0.94.patch


 This is what I saw:
 {noformat}
 2012-07-01 05:04:01,638 ERROR 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
 source 8 because an error occurred: Could not read peer's cluster id
 org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
 = Session expired for /va1-backup/hbaseid
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
 at 
 org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
 at 
 org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
 {noformat}
 The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419646#comment-13419646
 ] 

nkeywal commented on HBASE-6435:


If I can to keep the existing interface


Today, when you open a file, there is a call to a datanode if the file is also 
opened for writing somewhere. In HBase, we want the priorities to be taken into 
account during this opening, as we have a guess that one of these datanode may 
be dead.

So either I register a callback that the DFSClient will call before using its 
list, either I change the 'open' interface to add the possibility to provide 
the list of replicas. Same thing for chooseDataNode called from blockSeekTo: 
even if we have a list at the beginning, this list is recreated during a read 
as a part of the retry process (in case the NN discovered new replicas on new 
datanodes).

if we put a callback like

We would offer this service.
{noformat}
class  ReplicaSet {
  public ListReplica getAvailableReplica(long pos); // return the list of 
available replicas at given file offset, in priority order
  public void prioritizeReplica(Replica r); // move given replica to front of 
list
  public void blacklistReplica(Replica r); // move replica to back of list
}
{noformat}


The client would need to implement this interface:
{noformat}
// Implement this interface and provide it to the DFSClient during its 
construction to manage the replica ordering
interface OrganizeReplicaSet{
 void organize(String fileName, ReplicaSet rs); 
}
{noformat}

And the DFSClient code would become:
{noformat}
LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
  String src, long start, long length) throws IOException {
try {
LocatedBlocks lbs = namenode.getBlockLocations(src, start, length);
if (organizeReplicaSet != null){
ReplicaSet rs = LocatedBlocks.getAsReplicaSet()
try {
organizeReplicaSet.organize(src, rs);
}catch (Throwable t){
throw new IOException(ClientBlockReordorer failed. 
class=+reorderer.getClass(), t);
}
return new LocatedBlocks(rs);
} else
  return lbs;
{noformat}

This is called from the DFSInputStream constructor in openInfo today.

In real life I would try to use the class ReplicaSet as an interface on the 
internal LocatedBlock(s) to limit the number of objects created. The callback 
could also be given as a parameter to the DFSInputStream constructor if a there 
is a specific rule to apply...


 Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
 using dead datanodes
 

 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal
 Attachments: 6435.unfinished.patch


 HBase writes a Write-Ahead-Log to revover from hardware failure.
 This log is written with 'append' on hdfs.
 Through ZooKeeper, HBase gets informed usually in 30s that it should start 
 the recovery process. 
 This means reading the Write-Ahead-Log to replay the edits on the other 
 servers.
 In standards deployments, HBase process (regionserver) are deployed on the 
 same box as the datanodes.
 It means that when the box stops, we've actually lost one of the edits, as we 
 lost both the regionserver and the datanode.
 As HDFS marks a node as dead after ~10 minutes, it appears as available when 
 we try to read the blocks to recover. As such, we are delaying the recovery 
 process by 60 seconds as the read will usually fail with a socket timeout. If 
 the file is still opened for writing, it adds an extra 20s + a risk of losing 
 edits if we connect with ipc to the dead DN.
 Possible solutions are:
 - shorter dead datanodes detection by the NN. Requires a NN code change.
 - better dead datanodes management in DFSClient. Requires a DFS code change.
 - NN customisation to write the WAL files on another DN instead of the local 
 one.
 - reordering the blocks returned by the NN on the client side to put the 
 blocks on the same DN as the dead RS at the end of the priority queue. 
 Requires a DFS code change or a kind of workaround.
 The solution retained is the last one. Compared to what was discussed on the 
 mailing list, the proposed patch will not modify HDFS source code but adds a 
 proxy. This for two reasons:
 - Some HDFS functions managing block orders are static 
 (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
 require to implement partially the fix, change the DFS interface to make this 
 function non static, or put the hook static. None of these solution 

[jira] [Commented] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419648#comment-13419648
 ] 

Hudson commented on HBASE-5985:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-5985 TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 (Revision 
1363561)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java


 TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
 -

 Key: HBASE-5985
 URL: https://issues.apache.org/jira/browse/HBASE-5985
 Project: HBase
  Issue Type: Test
  Components: test
Affects Versions: 0.96.0
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: hbase-5985.patch


 ---
 Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec  
 FAILURE!
 org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD  Time elapsed: 0 
 sec   ERROR!
 java.io.IOException: Failed put; errcode=1
 at 
 org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124)
 at 
 org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
 at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
 at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
 at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
 at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
 at 
 org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
 at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6397) [hbck] print out bulk load commands for sidelined regions if necessary

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419654#comment-13419654
 ] 

Hudson commented on HBASE-6397:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6397 [hbck] print out bulk load commands for sidelined regions if 
necessary (Revision 1362247)

 Result = FAILURE
jxiang : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


 [hbck] print out bulk load commands for sidelined regions if necessary
 --

 Key: HBASE-6397
 URL: https://issues.apache.org/jira/browse/HBASE-6397
 Project: HBase
  Issue Type: Bug
  Components: hbck
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Trivial
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 6397-trunk.patch


 It's better to print out in the log the command line to bulk load back 
 sidelined regions, if any.
 Separate it out from HBASE-6392 since it is a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6426) Add Hadoop 2.0.x profile to 0.92+

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419647#comment-13419647
 ] 

Hudson commented on HBASE-6426:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6426 Add Hadoop 2.0.x profile to 0.92+ (Revision 1363211)

 Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/pom.xml


 Add Hadoop 2.0.x profile to 0.92+
 -

 Key: HBASE-6426
 URL: https://issues.apache.org/jira/browse/HBASE-6426
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.1

 Attachments: 6426.txt


 0.96 already has a Hadoop-2.0 build profile. Let's add this to 0.92 and 0.94 
 as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419649#comment-13419649
 ] 

Hudson commented on HBASE-6389:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6389 revert (Revision 1363193)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


 Modify the conditions to ensure that Master waits for sufficient number of 
 Region Servers before starting region assignments
 

 Key: HBASE-6389
 URL: https://issues.apache.org/jira/browse/HBASE-6389
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.94.0, 0.96.0
Reporter: Aditya Kishore
Assignee: Aditya Kishore
Priority: Critical
 Fix For: 0.96.0, 0.94.2

 Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
 HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
 testReplication.jstack


 Continuing from HBASE-6375.
 It seems I was mistaken in my assumption that changing the value of 
 hbase.master.wait.on.regionservers.mintostart to a sufficient number (from 
 default of 1) can help prevent assignment of all regions to one (or a small 
 number of) region server(s).
 While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
 0.94.0 onwards to address HBASE-4993.
 From 0.94.0 onwards, Master will proceed immediately after the timeout has 
 lapsed, even if hbase.master.wait.on.regionservers.mintostart has not 
 reached.
 Reading the current conditions of waitForRegionServers() clarifies it
 {code:title=ServerManager.java (trunk rev:1360470)}
 
 581 /**
 582  * Wait for the region servers to report in.
 583  * We will wait until one of this condition is met:
 584  *  - the master is stopped
 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
 587  *region servers is reached
 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
 AND
 589  *   there have been no new region server in for
 590  *  'hbase.master.wait.on.regionservers.interval' time
 591  *
 592  * @throws InterruptedException
 593  */
 594 public void waitForRegionServers(MonitoredTask status)
 595 throws InterruptedException {
 
 
 612   while (
 613 !this.master.isStopped() 
 614   slept  timeout 
 615   count  maxToStart 
 616   (lastCountChange+interval  now || count  minToStart)
 617 ){
 
 {code}
 So with the current conditions, the wait will end as soon as timeout is 
 reached even lesser number of RS have checked-in with the Master and the 
 master will proceed with the region assignment among these RSes alone.
 As mentioned in 
 -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
  and I concur, this could have disastrous effect in large cluster especially 
 now that MSLAB is turned on.
 To enforce the required quorum as specified by 
 hbase.master.wait.on.regionservers.mintostart irrespective of timeout, 
 these conditions need to be modified as following
 {code:title=ServerManager.java}
 ..
   /**
* Wait for the region servers to report in.
* We will wait until one of this condition is met:
*  - the master is stopped
*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
*region servers is reached
*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
*   there have been no new region server in for
*  'hbase.master.wait.on.regionservers.interval' time AND
*   the 'hbase.master.wait.on.regionservers.timeout' is reached
*
* @throws InterruptedException
*/
   public void waitForRegionServers(MonitoredTask status)
 ..
 ..
 int minToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.mintostart, 1);
 int maxToStart = this.master.getConfiguration().
 getInt(hbase.master.wait.on.regionservers.maxtostart, 
 Integer.MAX_VALUE);
 if (maxToStart  minToStart) {
   maxToStart = minToStart;
 }
 ..
 ..
 while (
   !this.master.isStopped() 
 count  maxToStart 
 (lastCountChange+interval  now || timeout  slept || count  
 minToStart)
   ){
 ..
 {code}

--
This 

[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419651#comment-13419651
 ] 

Hudson commented on HBASE-6406:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6406 Remove TestReplicationPeer (Revision 1363213)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java


 TestReplicationPeer.testResetZooKeeperSession and 
 TestZooKeeper.testClientSessionExpired fail frequently
 

 Key: HBASE-6406
 URL: https://issues.apache.org/jira/browse/HBASE-6406
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.1
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.1

 Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack


 Looking back through the 0.94 test runs these two tests accounted for 11 of 
 34 failed tests.
 They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419650#comment-13419650
 ] 

Hudson commented on HBASE-6319:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 ReplicationSource can call terminate on itself and deadlock
 ---

 Key: HBASE-6319
 URL: https://issues.apache.org/jira/browse/HBASE-6319
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.94.1

 Attachments: HBASE-6319-0.92.patch


 In a few places in the ReplicationSource code calls terminate on itself which 
 is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419652#comment-13419652
 ] 

Hudson commented on HBASE-4956:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob 
Copeland) (Revision 1363533)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java


 Control direct memory buffer consumption by HBaseClient
 ---

 Key: HBASE-4956
 URL: https://issues.apache.org/jira/browse/HBASE-4956
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Yu
Assignee: Bob Copeland
 Fix For: 0.96.0, 0.94.1

 Attachments: 4956.txt, thread_get.rb


 As Jonathan explained here 
 https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1
  , standard hbase client inadvertently consumes large amount of direct memory.
 We should consider using netty for NIO-related tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6392) UnknownRegionException blocks hbck from sideline big overlap regions

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419657#comment-13419657
 ] 

Hudson commented on HBASE-6392:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6392 UnknownRegionException blocks hbck from sideline big overlap 
regions (Revision 1363202)

 Result = FAILURE
jxiang : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


 UnknownRegionException blocks hbck from sideline big overlap regions
 

 Key: HBASE-6392
 URL: https://issues.apache.org/jira/browse/HBASE-6392
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: 6392-0.90.patch, 6392-trunk.patch, 6392-trunk_v2.patch, 
 6392_0.92.patch


 Before sidelining a big overlap region, hbck tries to close it and offline it 
 at first.  However, sometimes, it throws NotServingRegion or 
 UnknownRegionException.
 It could be because the region is not open/assigned at all, or some other 
 issue.
 We should figure out why and fix it.
 By the way, it's better to print out in the log the command line to bulk load 
 back sidelined regions, if any. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6420) Gracefully shutdown logsyncer

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419653#comment-13419653
 ] 

Hudson commented on HBASE-6420:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6420 Gracefully shutdown logsyncer (Revision 1363416)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


 Gracefully shutdown logsyncer
 -

 Key: HBASE-6420
 URL: https://issues.apache.org/jira/browse/HBASE-6420
 Project: HBase
  Issue Type: Bug
  Components: wal
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: 6420-trunk.patch


 Currently, in closing a HLog, logSyncerThread is interrupted. logSyncer could 
 be in the middle to sync the writer.  We should avoid interrupting the sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419658#comment-13419658
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


 [replication] Race in ReplicationSourceManager.init can initiate a failover 
 even if the node is alive
 -

 Key: HBASE-6325
 URL: https://issues.apache.org/jira/browse/HBASE-6325
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.6, 0.92.1, 0.94.0
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch


 Yet another bug found during the leap second madness, it's possible to miss 
 the registration of new region servers so that in 
 ReplicationSourceManager.init we start the failover of a live and replicating 
 region server. I don't think there's data loss but the RS that's being failed 
 over will die on:
 {noformat}
 2012-07-01 06:25:15,604 FATAL 
 org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
 sv4r23s48,10304,1341112194623: Writing replication status
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
 at 
 org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
 {noformat}
 It seems to me that just refreshing {{otherRegionServers}} after getting the 
 list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6382) Upgrade Jersey to 1.8 to match Hadoop 1 and 2

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419656#comment-13419656
 ] 

Hudson commented on HBASE-6382:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6382 Upgrade Jersey to 1.8 to match Hadoop 1 and 2 (David S. Wang) 
(Revision 1362308)

 Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/pom.xml


 Upgrade Jersey to 1.8 to match Hadoop 1 and 2
 -

 Key: HBASE-6382
 URL: https://issues.apache.org/jira/browse/HBASE-6382
 Project: HBase
  Issue Type: Improvement
  Components: rest
Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.2
Reporter: David S. Wang
Assignee: David S. Wang
 Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1

 Attachments: HBASE-6382-trunk.patch


 Upgrade Jersey dependency from 1.4 to 1.8 to match Hadoop dependencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419655#comment-13419655
 ] 

Hudson commented on HBASE-5966:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-5966 MapReduce based tests broken on Hadoop 2.0.0-alpha (Gregory 
Chanan) (Revision 1363586)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


 MapReduce based tests broken on Hadoop 2.0.0-alpha
 --

 Key: HBASE-5966
 URL: https://issues.apache.org/jira/browse/HBASE-5966
 Project: HBase
  Issue Type: Bug
  Components: mapred, mapreduce, test
Affects Versions: 0.94.0, 0.96.0
 Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
 Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
Reporter: Andrew Purtell
Assignee: Jimmy Xiang
 Fix For: 0.96.0, 0.94.1

 Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
 HBASE-5966.patch, hbase-5966.patch


 Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
 rigging. Below is a representative error, can be easily reproduced with:
 {noformat}
 mvn -PlocalTests -Psecurity \
   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
   clean test \
   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 {noformat}
 And the result:
 {noformat}
 ---
  T E S T S
 ---
 Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 ---
 Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
 ---
 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
  FAILURE!
 testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
 Time elapsed: 21.935 sec   ERROR!
 java.lang.reflect.UndeclaredThrowableException
   at 
 org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
   at 
 org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
   at 
 org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:616)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
   at 
 org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at 

[jira] [Resolved] (HBASE-6310) -ROOT- corruption when .META. is using the old encoding scheme

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-6310.
---

   Resolution: Invalid
Fix Version/s: (was: 0.94.2)
   (was: 0.96.0)

I'm resolving this as invalid, I was thrown in the wrong direction by what I 
thought were old/new .META. rows (they in fact never changed) whereas it was a 
.META. region from almost 3 years ago that was brought back to life. It could 
have been something like HBASE-6417 that happened, but since I don't have those 
logs anymore I can't be 100% sure until I reproduce the issue.

 -ROOT- corruption when .META. is using the old encoding scheme
 --

 Key: HBASE-6310
 URL: https://issues.apache.org/jira/browse/HBASE-6310
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.0
Reporter: Jean-Daniel Cryans
Priority: Blocker

 We're still working the on the root cause here, but after the leap second 
 armageddon we had a hard time getting our 0.94 cluster back up. This is what 
 we saw in the logs until the master died by itself:
 {noformat}
 2012-07-01 23:01:52,149 DEBUG
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
 locateRegionInMeta parentTable=-ROOT-,
 metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
 port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
 because: HRegionInfo was null or empty in -ROOT-,
 row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
 .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
 {noformat}
 (it's strage that we retry this)
 This was really misleading because I could see the regioninfo in a scan:
 {noformat}
 hbase(main):002:0 scan '-ROOT-'
 ROW   COLUMN+CELL
  .META.,,1column=info:regioninfo,
 timestamp=1331755381142, value={NAME = '.META.,,1', STARTKEY = '',
 ENDKEY = '', ENCODED = 1028785192,}
  .META.,,1column=info:server,
 timestamp=1341183448693, value=sfor3s40:10304
  .META.,,1
 column=info:serverstartcode, timestamp=1341183448693,
 value=1341183444689
  .META.,,1column=info:v,
 timestamp=1331755419291, value=\x00\x00
  .META.,,1259448304806column=info:server,
 timestamp=1341124914705, value=sfor3s24:10304
  .META.,,1259448304806
 column=info:serverstartcode, timestamp=1341124914705,
 value=1341124455863
 {noformat}
 Except that the devil is in the details, .META.,,1 is not 
 .META.,,1259448304806. Basically something writes to .META. by directly 
 creating the row key without caring if the row is in the old format. I did a 
 deleteall in the shell and it fixed the issue... until some time later it was 
 stuck again because the edits reappeared (still not sure why). This time the 
 PostOpenDeployTasksThread were stuck in the RS trying to update .META. but 
 there was no logging (saw it with a jstack). I deleted the row again to make 
 it work.
 I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 
 out, but I wouldn't recommend upgrading to 0.94 if your cluster was created 
 before 0.89

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419691#comment-13419691
 ] 

Jimmy Xiang commented on HBASE-6228:


I'd like to fix this in HBASE-6381 by making sure SSH blocks on 
AM.processServerShutdown until the master has joined the cluster, and fixed 
missing daughters.

 Fixup daughters twice  cause daughter region assigned twice
 ---

 Key: HBASE-6228
 URL: https://issues.apache.org/jira/browse/HBASE-6228
 Project: HBase
  Issue Type: Bug
  Components: master
Reporter: chunhui shen
Assignee: chunhui shen
 Fix For: 0.96.0

 Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
 HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch


 First, how fixup daughters twice happen?
 1.we will fixupDaughters at the last of HMaster#finishInitialization
 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
 ServerShutdownHandler#processDeadRegion
 When fixupDaughters, we will added daughters to .META., but it coudn't 
 prevent the above case, because FindDaughterVisitor.
 The detail is as the following:
 Suppose region A is a splitted parent region, and its daughter region B is 
 missing
 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
 to .META. with serverName=null, and assign the daughter.
 2.Then, Master's initialization thread will also find the daughter region B 
 is missing and assign it. It is because FindDaughterVisitor consider daughter 
 is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419692#comment-13419692
 ] 

Hudson commented on HBASE-6433:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #101 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/101/])
HBASE-6433 Improve HBaseServer#getRemoteAddress by utilizing 
HBaseServer.Connection.hostAddress (binlijin) (Revision 1363905)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java


 Improve HBaseServer#getRemoteAddress by utilizing 
 HBaseServer.Connection.hostAddress
 

 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Assignee: binlijin
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
 HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch


 Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
 call.connection.socket.getInetAddress().
 The host address is actually stored in HBaseServer.Connection.hostAddress 
 field. We don't need to go through Socket to get this information.
 Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419695#comment-13419695
 ] 

Lars Hofhansl commented on HBASE-6428:
--

That is an excellent point.
Should also think about HBASE-6427 with this in mind.

 Pluggable Compaction policies
 -

 Key: HBASE-6428
 URL: https://issues.apache.org/jira/browse/HBASE-6428
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 For some usecases is useful to allow more control over how KVs get compacted.
 For example one could envision storing old versions of a KV separate HFiles, 
 which then rarely have to be touched/cached by queries querying for new data.
 In addition these date ranged HFile can be easily used for backups while 
 maintaining historical data.
 This would be a major change, allowing compactions to provide multiple 
 targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419696#comment-13419696
 ] 

Lars Hofhansl commented on HBASE-5547:
--

+1 on patch. Ted pinged me, that he is out already.
Since this is a Salesforce patch, I should commit it anyway.

Will do so as soon as I get to it.

Jesse, do you have a feeling about how different a 0.94 patch would be?

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419697#comment-13419697
 ] 

Jesse Yates commented on HBASE-5547:


@Lars I don't think it would be all that different. I'll take a crack next week 
(after dealing with the next round of HBASE-6055 stuff).

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419722#comment-13419722
 ] 

Lars Hofhansl commented on HBASE-5659:
--

Without parent the revised test fails every time. With parent it fails rarely.
I do not know what the issue is.

This only happens when the test does heavy flushing (during the course of the 
test  1000 flushes happen. So the problem might be there.

I can offer to disable the test or to reduce the number of flushes for now, but 
of course that pastes over the problem.

I also would not mind if somebody else has a look at the test and check whether 
test logic itself is flawed.

 TestAtomicOperation.testMultiRowMutationMultiThreads is still failing 
 occasionally
 --

 Key: HBASE-5659
 URL: https://issues.apache.org/jira/browse/HBASE-5659
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0


 See run here: 
 https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/
 {quote}
 2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = 
 rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0
 2012-03-27 04:36:12,629 INFO  [Thread-121] regionserver.HRegion(1558): 
 Finished memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region 
 testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, 
 sequenceid=7927, compaction requested=true
 2012-03-27 04:36:12,629 DEBUG [Thread-126] 
 regionserver.TestAtomicOperation$2(362): flushing
 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): 
 Started memstore flush for 
 testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., current region 
 memstore size 1.9k
 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): 
 Finished snapshotting 
 testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., commencing wait 
 for mvcc, flushsize=1968
 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): 
 Finished snapshotting, commencing flushing stores
 2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating 
 file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
  with permission=rwxrwxrwx
 2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): 
 Initialized with CacheConfig:enabled [cacheDataOnRead=true] 
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
 2012-03-27 04:36:12,631 INFO  [Thread-126] 
 regionserver.StoreFile$Writer(997): Delete Family Bloom filter type for 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57:
  CompoundBloomFilterWriter
 2012-03-27 04:36:12,632 INFO  [Thread-126] 
 regionserver.StoreFile$Writer(1220): NO General Bloom and NO DeleteFamily was 
 added to HFile 
 (/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57)
  
 2012-03-27 04:36:12,632 INFO  [Thread-126] regionserver.Store(770): Flushed , 
 sequenceid=7934, memsize=1.9k, into tmp file 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
 2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming 
 flushed file at 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
  to 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57
 2012-03-27 04:36:12,634 INFO  [Thread-126] regionserver.Store(818): Added 
 

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419723#comment-13419723
 ] 

Lars Hofhansl commented on HBASE-5547:
--

I also verified in a real setup, that an HFile is indeed archived and (by 
default) removed after 5 mins. Was thrown off first, because the 
table/region/cf directory is not removed when empty.
Also made sure I can create/drop tables and then .META. is backed up correctly.

So still +1 :)


 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally

2012-07-20 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5659:
-

Fix Version/s: 0.94.2

 TestAtomicOperation.testMultiRowMutationMultiThreads is still failing 
 occasionally
 --

 Key: HBASE-5659
 URL: https://issues.apache.org/jira/browse/HBASE-5659
 Project: HBase
  Issue Type: Sub-task
Reporter: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0, 0.94.2


 See run here: 
 https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/
 {quote}
 2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): 
 Storescanner.peek() is changed where before = 
 rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = 
 rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0
 2012-03-27 04:36:12,629 INFO  [Thread-121] regionserver.HRegion(1558): 
 Finished memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region 
 testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, 
 sequenceid=7927, compaction requested=true
 2012-03-27 04:36:12,629 DEBUG [Thread-126] 
 regionserver.TestAtomicOperation$2(362): flushing
 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): 
 Started memstore flush for 
 testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., current region 
 memstore size 1.9k
 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): 
 Finished snapshotting 
 testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., commencing wait 
 for mvcc, flushsize=1968
 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): 
 Finished snapshotting, commencing flushing stores
 2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating 
 file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
  with permission=rwxrwxrwx
 2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): 
 Initialized with CacheConfig:enabled [cacheDataOnRead=true] 
 [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
 [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
 2012-03-27 04:36:12,631 INFO  [Thread-126] 
 regionserver.StoreFile$Writer(997): Delete Family Bloom filter type for 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57:
  CompoundBloomFilterWriter
 2012-03-27 04:36:12,632 INFO  [Thread-126] 
 regionserver.StoreFile$Writer(1220): NO General Bloom and NO DeleteFamily was 
 added to HFile 
 (/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57)
  
 2012-03-27 04:36:12,632 INFO  [Thread-126] regionserver.Store(770): Flushed , 
 sequenceid=7934, memsize=1.9k, into tmp file 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
 2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming 
 flushed file at 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
  to 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57
 2012-03-27 04:36:12,634 INFO  [Thread-126] regionserver.Store(818): Added 
 /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57,
  entries=12, sequenceid=7934, filesize=1.3k
 2012-03-27 04:36:12,642 DEBUG [Thread-118] 
 regionserver.TestAtomicOperation$2(392): []
 Exception in thread Thread-118 junit.framework.AssertionFailedError at 
 junit.framework.Assert.fail(Assert.java:48)
   

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419724#comment-13419724
 ] 

Lars Hofhansl commented on HBASE-5547:
--

Ok. On more question I asked just now on RB:

From Matteo:
{quote}
MasterFileSystem contains deleteRegion() and deleteTable() that calls 
fs.delete() with the recursive flag on.
This two methods get called by DeleteTableHandler (drop table).
In a backup/snapshot situation we want to keep the regions/hfiles.
{quote}

My follow up question:
{quote}
I find that deleteRegion() was addressed, but not deleteTable().

That means if a table is dropped the HFiles would be deleted and not 
archived.
So it seems we should either:
- also delete the table's archive directory (since it would be 
incomplete anyway).
- archive all the HFile before deleting them.

What do you think Jesse?
{quote}



 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419725#comment-13419725
 ] 

Lars Hofhansl edited comment on HBASE-5547 at 7/21/12 2:49 AM:
---

Or is it that all regions are first deleted anyway, and only then the 
deleteTable is called (in DeleteTableHandler.handleTableOperation)?

Edit: Spelling

  was (Author: lhofhansl):
Or is it that all region are first deleted anyway, and only then the 
deleteTable is called (in DeleteTableHandler.handleTableOperation)
  
 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419725#comment-13419725
 ] 

Lars Hofhansl commented on HBASE-5547:
--

Or is it that all region are first deleted anyway, and only then the 
deleteTable is called (in DeleteTableHandler.handleTableOperation)

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5954) Allow proper fsync support for HBase

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419738#comment-13419738
 ] 

Lars Hofhansl commented on HBASE-5954:
--

I think the API going multiple ways (these are not mutually exclusive):

# hsync for HFiles (would guard compactions, etc, very lightweight), enabled 
with a config option (default on I think)
# hsync all WAL edits (very expensive, but would not require client changes), 
enabled with a config option (default off)
# sync per Put. Gives control to the application. A batch put would hsync the 
WAL if at least one Put in the batch was market with hsync. What about deletes? 
In 0.94 they are not batched; could it at the end of operation there.
# Per RPC. Could send flag with the RPC from the client. I.e. HTable would have 
a Put(ListPut puts, boolean hsync) method
# HTable.hsync. Client calls this when data must be sync'ed. Most flexible, but 
incurs an extra RPC to the RegionServer just to force the hsync.

Comments welcome.


 Allow proper fsync support for HBase
 

 Key: HBASE-5954
 URL: https://issues.apache.org/jira/browse/HBASE-5954
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.2

 Attachments: 5954-trunk-hdfs-trunk-v2.txt, 
 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk-v4.txt, 
 5954-trunk-hdfs-trunk-v5.txt, 5954-trunk-hdfs-trunk-v6.txt, 
 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-5954) Allow proper fsync support for HBase

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419738#comment-13419738
 ] 

Lars Hofhansl edited comment on HBASE-5954 at 7/21/12 4:01 AM:
---

I think the API going multiple ways (these are not mutually exclusive):

# hsync for HFiles (would guard compactions, etc, very lightweight), enabled 
with a config option (default on I think)
# hsync all WAL edits (very expensive, but would not require client changes), 
enabled with a config option (default off)
# hsync for tables or column families for HFiles (configured in the 
table/column descriptor)
# hsync for tables or column families for the WAL (configured in the 
table/column descriptor)
# WAL hsync per Put. Gives control to the application. A batch put would hsync 
the WAL if at least one Put in the batch was market with hsync. What about 
deletes? In 0.94 they are not batched; could it at the end of operation there.
# WAL hsync per RPC. Could send flag with the RPC from the client. I.e. HTable 
would have a Put(ListPut puts, boolean hsync) method
# HTable.hsync. Client calls this when WAL must be sync'ed. Most flexible, but 
incurs an extra RPC to the RegionServer just to force the hsync.

Comments welcome.

Edit: Forgot some options.

  was (Author: lhofhansl):
I think the API going multiple ways (these are not mutually exclusive):

# hsync for HFiles (would guard compactions, etc, very lightweight), enabled 
with a config option (default on I think)
# hsync all WAL edits (very expensive, but would not require client changes), 
enabled with a config option (default off)
# sync per Put. Gives control to the application. A batch put would hsync the 
WAL if at least one Put in the batch was market with hsync. What about deletes? 
In 0.94 they are not batched; could it at the end of operation there.
# Per RPC. Could send flag with the RPC from the client. I.e. HTable would have 
a Put(ListPut puts, boolean hsync) method
# HTable.hsync. Client calls this when data must be sync'ed. Most flexible, but 
incurs an extra RPC to the RegionServer just to force the hsync.

Comments welcome.

  
 Allow proper fsync support for HBase
 

 Key: HBASE-5954
 URL: https://issues.apache.org/jira/browse/HBASE-5954
 Project: HBase
  Issue Type: Improvement
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.2

 Attachments: 5954-trunk-hdfs-trunk-v2.txt, 
 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk-v4.txt, 
 5954-trunk-hdfs-trunk-v5.txt, 5954-trunk-hdfs-trunk-v6.txt, 
 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently

2012-07-20 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6406:
-

Fix Version/s: (was: 0.94.1)
   0.94.2

 TestReplicationPeer.testResetZooKeeperSession and 
 TestZooKeeper.testClientSessionExpired fail frequently
 

 Key: HBASE-6406
 URL: https://issues.apache.org/jira/browse/HBASE-6406
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.1
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.96.0, 0.94.2

 Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack


 Looking back through the 0.94 test runs these two tests accounted for 11 of 
 34 failed tests.
 They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419746#comment-13419746
 ] 

Zhihong Ted Yu commented on HBASE-5547:
---

I think the latest patch has addressed HFile archival when deleteRegion() is 
called.

Backing up / restoring table can be addressed in HBASE-6055.

 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in backup mode

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419748#comment-13419748
 ] 

Lars Hofhansl commented on HBASE-5547:
--

True, but if dropping a table just drops the latest HFiles to the floor and 
leaves a partial backup around this entire exercise is pointless.

Anyway, from the code in DeleteTableHandler.handleTableOperation it looks like 
all regions are deleted first (using deleteRegion) and then the table directory 
is deleted, so it should be correct. Just making sure here.


 Don't delete HFiles when in backup mode
 -

 Key: HBASE-5547
 URL: https://issues.apache.org/jira/browse/HBASE-5547
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Jesse Yates
 Fix For: 0.94.2

 Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
 hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
 java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
 java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
 java_HBASE-5547_v7.patch


 This came up in a discussion I had with Stack.
 It would be nice if HBase could be notified that a backup is in progress (via 
 a znode for example) and in that case either:
 1. rename HFiles to be delete to file.bck
 2. rename the HFiles into a special directory
 3. rename them to a general trash directory (which would not need to be tied 
 to backup mode).
 That way it should be able to get a consistent backup based on HFiles (HDFS 
 snapshots or hard links would be better options here, but we do not have 
 those).
 #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira