[jira] [Created] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)
binlijin created HBASE-6433:
---

 Summary: improve getRemoteAddress
 Key: HBASE-6433
 URL: https://issues.apache.org/jira/browse/HBASE-6433
 Project: HBase
  Issue Type: Improvement
Reporter: binlijin
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6429:
--

Status: Patch Available  (was: Open)

> Filter with filterRow() returning true is also incompatible with scan with 
> limit
> 
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419046#comment-13419046
 ] 

Zhihong Ted Yu commented on HBASE-3725:
---

How about renaming leftResults as remainingResults ?

Please prepare patch for trunk. 

Thanks

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: Jonathan Gray
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes th

[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6433:


Attachment: HBASE-6433-trunk.patch

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6433:


Attachment: HBASE-6433-94.patch
HBASE-6433-90.patch
HBASE-6433-92.patch

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is also incompatible with scan with limit

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419049#comment-13419049
 ] 

Hadoop QA commented on HBASE-6429:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537300/hbase-6429-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.TestCheckTestClasses

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2419//console

This message is automatically generated.

> Filter with filterRow() returning true is also incompatible with scan with 
> limit
> 
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread binlijin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

binlijin updated HBASE-6433:


Description: Without this patch it costs 4000ns, with this patch it costs 
1600ns

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Status: Patch Available  (was: Open)

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6429:
--

Summary: Filter with filterRow() returning true is incompatible with scan 
with limit  (was: Filter with filterRow() returning true is also incompatible 
with scan with limit)

> Filter with filterRow() returning true is incompatible with scan with limit
> ---
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419124#comment-13419124
 ] 

Zhihong Ted Yu commented on HBASE-6429:
---

TestFilterWithScanLimits.java and FilterWrapper.java need Apache license.

{code}
+if(null == filter) {
{code}
Space between if and (.

Why does TestFilterWithScanLimits have main() method ?
It should be classified as medium test.

> Filter with filterRow() returning true is incompatible with scan with limit
> ---
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6429) Filter with filterRow() returning true is incompatible with scan with limit

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6429:
--

Status: Open  (was: Patch Available)

> Filter with filterRow() returning true is incompatible with scan with limit
> ---
>
> Key: HBASE-6429
> URL: https://issues.apache.org/jira/browse/HBASE-6429
> Project: HBase
>  Issue Type: Bug
>  Components: filters
>Affects Versions: 0.96.0
>Reporter: Jason Dai
> Attachments: hbase-6429-trunk.patch, hbase-6429_0_94_0.patch
>
>
> Currently if we scan with bot limit and a Filter with 
> filterRow(List) implemented, an  IncompatibleFilterException will 
> be thrown. The same exception should also be thrown if the filer has its 
> filterRow() implemented.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419152#comment-13419152
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

The trunk patch contains reordering of imports which kind of distracts from the 
goal for this JIRA.

Otherwise patch looks good.

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-6433:
-

Assignee: binlijin

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419154#comment-13419154
 ] 

Hadoop QA commented on HBASE-6433:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537328/HBASE-6433-trunk.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2420//console

This message is automatically generated.

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: HBASE-6433-90.patch, HBASE-6433-92.patch, 
> HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Attachment: 6433-getRemoteAddress-trunk.txt

Simplified patch for trunk.

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419207#comment-13419207
 ] 

Zhihong Ted Yu commented on HBASE-5547:
---

Will integrate in 3 hours if there is no objection.

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419255#comment-13419255
 ] 

Zhihong Ted Yu commented on HBASE-3725:
---

In trunk, getLastIncrement() call has been replaced with:
{code}
  List results = get(get, false);
{code}

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: Jonathan Gray
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>   

[jira] [Assigned] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu reassigned HBASE-3725:
-

Assignee: ShiXing  (was: Jonathan Gray)

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: ShiXing
> Fix For: 0.92.2
>
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes the data then increments the column 10 times by 1 each 
> time
>*
>* Shoul

[jira] [Updated] (HBASE-3725) HBase increments from old value after delete and write to disk

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-3725:
--

Fix Version/s: 0.92.2

> HBase increments from old value after delete and write to disk
> --
>
> Key: HBASE-3725
> URL: https://issues.apache.org/jira/browse/HBASE-3725
> Project: HBase
>  Issue Type: Bug
>  Components: io, regionserver
>Affects Versions: 0.90.1
>Reporter: Nathaniel Cook
>Assignee: ShiXing
> Fix For: 0.92.2
>
> Attachments: HBASE-3725-0.92-V1.patch, HBASE-3725-0.92-V2.patch, 
> HBASE-3725-0.92-V3.patch, HBASE-3725-0.92-V4.patch, HBASE-3725-0.92-V5.patch, 
> HBASE-3725-0.92-V6.patch, HBASE-3725-Test-v1.patch, HBASE-3725-v3.patch, 
> HBASE-3725.patch
>
>
> Deleted row values are sometimes used for starting points on new increments.
> To reproduce:
> Create a row "r". Set column "x" to some default value.
> Force hbase to write that value to the file system (such as restarting the 
> cluster).
> Delete the row.
> Call table.incrementColumnValue with "some_value"
> Get the row.
> The returned value in the column was incremented from the old value before 
> the row was deleted instead of being initialized to "some_value".
> Code to reproduce:
> {code}
> import java.io.IOException;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.HColumnDescriptor;
> import org.apache.hadoop.hbase.HTableDescriptor;
> import org.apache.hadoop.hbase.client.Delete;
> import org.apache.hadoop.hbase.client.Get;
> import org.apache.hadoop.hbase.client.HBaseAdmin;
> import org.apache.hadoop.hbase.client.HTableInterface;
> import org.apache.hadoop.hbase.client.HTablePool;
> import org.apache.hadoop.hbase.client.Increment;
> import org.apache.hadoop.hbase.client.Result;
> import org.apache.hadoop.hbase.util.Bytes;
> public class HBaseTestIncrement
> {
>   static String tableName  = "testIncrement";
>   static byte[] infoCF = Bytes.toBytes("info");
>   static byte[] rowKey = Bytes.toBytes("test-rowKey");
>   static byte[] newInc = Bytes.toBytes("new");
>   static byte[] oldInc = Bytes.toBytes("old");
>   /**
>* This code reproduces a bug with increment column values in hbase
>* Usage: First run part one by passing '1' as the first arg
>*Then restart the hbase cluster so it writes everything to disk
>*Run part two by passing '2' as the first arg
>*
>* This will result in the old deleted data being found and used for 
> the increment calls
>*
>* @param args
>* @throws IOException
>*/
>   public static void main(String[] args) throws IOException
>   {
>   if("1".equals(args[0]))
>   partOne();
>   if("2".equals(args[0]))
>   partTwo();
>   if ("both".equals(args[0]))
>   {
>   partOne();
>   partTwo();
>   }
>   }
>   /**
>* Creates a table and increments a column value 10 times by 10 each 
> time.
>* Results in a value of 100 for the column
>*
>* @throws IOException
>*/
>   static void partOne()throws IOException
>   {
>   Configuration conf = HBaseConfiguration.create();
>   HBaseAdmin admin = new HBaseAdmin(conf);
>   HTableDescriptor tableDesc = new HTableDescriptor(tableName);
>   tableDesc.addFamily(new HColumnDescriptor(infoCF));
>   if(admin.tableExists(tableName))
>   {
>   admin.disableTable(tableName);
>   admin.deleteTable(tableName);
>   }
>   admin.createTable(tableDesc);
>   HTablePool pool = new HTablePool(conf, Integer.MAX_VALUE);
>   HTableInterface table = pool.getTable(Bytes.toBytes(tableName));
>   //Increment unitialized column
>   for (int j = 0; j < 10; j++)
>   {
>   table.incrementColumnValue(rowKey, infoCF, oldInc, 
> (long)10);
>   Increment inc = new Increment(rowKey);
>   inc.addColumn(infoCF, newInc, (long)10);
>   table.increment(inc);
>   }
>   Get get = new Get(rowKey);
>   Result r = table.get(get);
>   System.out.println("initial values: new " + 
> Bytes.toLong(r.getValue(infoCF, newInc)) + " old " + 
> Bytes.toLong(r.getValue(infoCF, oldInc)));
>   }
>   /**
>* First deletes the data then increments the column 10 times by 1 each 
> time
>*
>* Should result in a value of 1

[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419263#comment-13419263
 ] 

Andrew Purtell commented on HBASE-6432:
---

Seems reasonable and low risk to pull the ID from ZooKeeper.

> HRegionServer doesn't properly set clusterId in conf
> 
>
> Key: HBASE-6432
> URL: https://issues.apache.org/jira/browse/HBASE-6432
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0
>Reporter: Francis Liu
>Assignee: Francis Liu
> Fix For: 0.96.0
>
> Attachments: HBASE-6432_94.patch
>
>
> ClusterId is normally set into the passed conf during instantiation of an 
> HTable class. In the case of a HRegionServer this is bypassed and set to 
> "default" since getMaster() since it uses HBaseRPC to create the proxy 
> directly and bypasses the class which retrieves and sets the correct 
> clusterId. 
> This becomes a problem with clients (ie within a coprocessor) using 
> delegation tokens for authentication. Since the token's service will be the 
> correct clusterId and while the TokenSelector is looking for one with service 
> "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-6432:
--

Affects Version/s: 0.96.0
Fix Version/s: (was: 0.96.0)

> HRegionServer doesn't properly set clusterId in conf
> 
>
> Key: HBASE-6432
> URL: https://issues.apache.org/jira/browse/HBASE-6432
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Francis Liu
>Assignee: Francis Liu
> Attachments: HBASE-6432_94.patch
>
>
> ClusterId is normally set into the passed conf during instantiation of an 
> HTable class. In the case of a HRegionServer this is bypassed and set to 
> "default" since getMaster() since it uses HBaseRPC to create the proxy 
> directly and bypasses the class which retrieves and sets the correct 
> clusterId. 
> This becomes a problem with clients (ie within a coprocessor) using 
> delegation tokens for authentication. Since the token's service will be the 
> correct clusterId and while the TokenSelector is looking for one with service 
> "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419271#comment-13419271
 ] 

Andrew Purtell edited comment on HBASE-6432 at 7/20/12 3:44 PM:


However, the master is responsible for publishing the cluster ID to ZooKeeper. 
If on a fresh install the regionservers are started first, then they won't find 
the ID up in ZK until the master comes up. I think this should be a Chore that 
retries until the ID is found then exits.

  was (Author: apurtell):
However, the master is responsible for publishing the cluster ID to 
ZooKeeper. If on a fresh install the regionservers are started first, then they 
won't find the ID up in ZK. I think this should be a Chore that retries until 
the ID is found then exits.
  
> HRegionServer doesn't properly set clusterId in conf
> 
>
> Key: HBASE-6432
> URL: https://issues.apache.org/jira/browse/HBASE-6432
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Francis Liu
>Assignee: Francis Liu
> Attachments: HBASE-6432_94.patch
>
>
> ClusterId is normally set into the passed conf during instantiation of an 
> HTable class. In the case of a HRegionServer this is bypassed and set to 
> "default" since getMaster() since it uses HBaseRPC to create the proxy 
> directly and bypasses the class which retrieves and sets the correct 
> clusterId. 
> This becomes a problem with clients (ie within a coprocessor) using 
> delegation tokens for authentication. Since the token's service will be the 
> correct clusterId and while the TokenSelector is looking for one with service 
> "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6432) HRegionServer doesn't properly set clusterId in conf

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419271#comment-13419271
 ] 

Andrew Purtell commented on HBASE-6432:
---

However, the master is responsible for publishing the cluster ID to ZooKeeper. 
If on a fresh install the regionservers are started first, then they won't find 
the ID up in ZK. I think this should be a Chore that retries until the ID is 
found then exits.

> HRegionServer doesn't properly set clusterId in conf
> 
>
> Key: HBASE-6432
> URL: https://issues.apache.org/jira/browse/HBASE-6432
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Francis Liu
>Assignee: Francis Liu
> Attachments: HBASE-6432_94.patch
>
>
> ClusterId is normally set into the passed conf during instantiation of an 
> HTable class. In the case of a HRegionServer this is bypassed and set to 
> "default" since getMaster() since it uses HBaseRPC to create the proxy 
> directly and bypasses the class which retrieves and sets the correct 
> clusterId. 
> This becomes a problem with clients (ie within a coprocessor) using 
> delegation tokens for authentication. Since the token's service will be the 
> correct clusterId and while the TokenSelector is looking for one with service 
> "default".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2012-07-20 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419279#comment-13419279
 ] 

Andrew Purtell commented on HBASE-6428:
---

bq. For example one could envision storing old versions of a KV separate 
HFiles, which then rarely have to be touched/cached by queries querying for new 
data. In addition these date ranged HFile can be easily used for backups while 
maintaining historical data.

I'd be curious if you think the Coprocessor API for compactions cannot be 
reworked to handle this.

> Pluggable Compaction policies
> -
>
> Key: HBASE-6428
> URL: https://issues.apache.org/jira/browse/HBASE-6428
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>
> For some usecases is useful to allow more control over how KVs get compacted.
> For example one could envision storing old versions of a KV separate HFiles, 
> which then rarely have to be touched/cached by queries querying for new data.
> In addition these date ranged HFile can be easily used for backups while 
> maintaining historical data.
> This would be a major change, allowing compactions to provide multiple 
> targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419285#comment-13419285
 ] 

Hadoop QA commented on HBASE-6433:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12537359/6433-getRemoteAddress-trunk.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 12 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  org.apache.hadoop.hbase.master.TestSplitLogManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2421//console

This message is automatically generated.

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6434) Document effect of slow compressors on the flush path and workaround in the online book

2012-07-20 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-6434:
-

 Summary: Document effect of slow compressors on the flush path and 
workaround in the online book
 Key: HBASE-6434
 URL: https://issues.apache.org/jira/browse/HBASE-6434
 Project: HBase
  Issue Type: Task
Reporter: Andrew Purtell
Priority: Minor


In HBASE-6423 Karthik writes
bq. 1. flushing a memstore takes a while (GZIP compression)
[... and the memstore gate comes crashing down]

We once sidestepped this issue by specifying different compression options for 
flushes (LZO or none) and major compaction (BZIP2), disabling automatic major 
compaction, and managing major compaction from a shell based process that 
iterates over each region on disk and makes some application specific decisions.

I go back and forth on whether this is a hack or legitimate HBase ops given how 
things currently work.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419316#comment-13419316
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

I ran the two tests listed above and they passed:
{code}
Running org.apache.hadoop.hbase.master.TestSplitLogManager
Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 21.933 sec
Running org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 68.194 sec
{code}

Will integrate to trunk later today if there is no objection.

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)
nkeywal created HBASE-6435:
--

 Summary: Reading WAL files after a recovery leads to time lost in 
HDFS timeouts when using dead datanodes
 Key: HBASE-6435
 URL: https://issues.apache.org/jira/browse/HBASE-6435
 Project: HBase
  Issue Type: Improvement
  Components: master, regionserver
Affects Versions: 0.96.0
Reporter: nkeywal
Assignee: nkeywal


HBase writes a Write-Ahead-Log to revover from hardware failure.
This log is written with 'append' on hdfs.
Through ZooKeeper, HBase gets informed usually in 30s that it should start the 
recovery process. 
This means reading the Write-Ahead-Log to replay the edits on the other servers.

In standards deployments, HBase process (regionserver) are deployed on the same 
box as the datanodes.

It means that when the box stops, we've actually lost one of the edits, as we 
lost both the regionserver and the datanode.

As HDFS marks a node as dead after ~10 minutes, it appears as available when we 
try to read the blocks to recover. As such, we are delaying the recovery 
process by 60 seconds as the read will usually fail with a socket timeout. If 
the file is still opened for writing, it adds an extra 20s + a risk of losing 
edits if we connect with ipc to the dead DN.


Possible solutions are:
- shorter dead datanodes detection by the NN. Requires a NN code change.
- better dead datanodes management in DFSClient. Requires a DFS code change.
- NN customisation to write the WAL files on another DN instead of the local 
one.
- reordering the blocks returned by the NN on the client side to put the blocks 
on the same DN as the dead RS at the end of the priority queue. Requires a DFS 
code change or a kind of workaround.

The solution retained is the last one. Compared to what was discussed on the 
mailing list, the proposed patch will not modify HDFS source code but adds a 
proxy. This for two reasons:
- Some HDFS functions managing block orders are static 
(MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would require 
to implement partially the fix, change the DFS interface to make this function 
non static, or put the hook static. None of these solution is very clean. 
- Adding a proxy allows to put all the code in HBase, simplifying dependency 
management.

Nevertheless, it would be better to have this in HDFS. But this solution allows 
to target the last version only, and this could allow minimal interface changes 
such as non static methods.

Moreover, writing the blocks to the non local DN would be an even better 
solution long term.






--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419356#comment-13419356
 ] 

nkeywal commented on HBASE-6401:


@stack
bq. Does svn blame/git bisecting not turn up the issue that fixed this?
Will try.

> HBase may lose edits after a crash if used with HDFS 1.0.3 or older
> ---
>
> Key: HBASE-6401
> URL: https://issues.apache.org/jira/browse/HBASE-6401
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
> Environment: all
>Reporter: nkeywal
>Priority: Critical
> Attachments: TestReadAppendWithDeadDN.java
>
>
> This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
> hdfs jira for this.
> Context: HBase Write Ahead Log features. This is using hdfs append. If the 
> node crashes, the file that was written is read by other processes to replay 
> the action.
> - So we have in hdfs one (dead) process writing with another process reading.
> - But, despite the call to syncFs, we don't always see the data when we have 
> a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
> ignores the ipc errors and set the length to 0.
> - So we may miss all the writes to the last block if we try to connect to the 
> dead DN.
> hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup
> hdfs branch-2 or trunk: we should not have the issue (but not tested)
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
> The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419358#comment-13419358
 ] 

nkeywal commented on HBASE-6401:


HBASE-6435 will lower the probability to get the issue but will not solve it 
totally.

> HBase may lose edits after a crash if used with HDFS 1.0.3 or older
> ---
>
> Key: HBASE-6401
> URL: https://issues.apache.org/jira/browse/HBASE-6401
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
> Environment: all
>Reporter: nkeywal
>Priority: Critical
> Attachments: TestReadAppendWithDeadDN.java
>
>
> This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
> hdfs jira for this.
> Context: HBase Write Ahead Log features. This is using hdfs append. If the 
> node crashes, the file that was written is read by other processes to replay 
> the action.
> - So we have in hdfs one (dead) process writing with another process reading.
> - But, despite the call to syncFs, we don't always see the data when we have 
> a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
> ignores the ipc errors and set the length to 0.
> - So we may miss all the writes to the last block if we try to connect to the 
> dead DN.
> hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup
> hdfs branch-2 or trunk: we should not have the issue (but not tested)
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
> The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) improve getRemoteAddress

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419360#comment-13419360
 ] 

Jean-Daniel Cryans commented on HBASE-6433:
---

Can we have a meaningful jira title with a meaningful description of the 
problem plus how it's being addressed?

> improve getRemoteAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Alex Baranau (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Baranau updated HBASE-6411:


Attachment: HBASE-6411-1.patch

Adjusted Elliott's patch.

Added example unit-test for MasterMetrics that verifies metrics value change. 
Had to create MetricsAsserts shim in test sources in compat modules.

Please let me know what you think.

Will try to extract maps in BaseMetricsSourceImpl(s) into separate class and 
add support for MetricTags. I guess we agreed on that previously.

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
> HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Ted Yu updated HBASE-6433:
--

Description: 
Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
call.connection.socket.getInetAddress().

The host address is actually stored in HBaseServer.Connection.hostAddress 
field. We don't need to go through Socket to get this information.

Without this patch it costs 4000ns, with this patch it costs 1600ns

  was:Without this patch it costs 4000ns, with this patch it costs 1600ns

Summary: Improve HBaseServer#getRemoteAddress by utilizing 
HBaseServer.Connection.hostAddress  (was: improve getRemoteAddress)

@J-D:
See if updated description suffices.

> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419398#comment-13419398
 ] 

nkeywal commented on HBASE-6401:


@stack The oldest version of DFSInputStream.java (it"s split from DFSClient) is 
one year old and seems ok on trunk.

> HBase may lose edits after a crash if used with HDFS 1.0.3 or older
> ---
>
> Key: HBASE-6401
> URL: https://issues.apache.org/jira/browse/HBASE-6401
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
> Environment: all
>Reporter: nkeywal
>Priority: Critical
> Attachments: TestReadAppendWithDeadDN.java
>
>
> This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
> hdfs jira for this.
> Context: HBase Write Ahead Log features. This is using hdfs append. If the 
> node crashes, the file that was written is read by other processes to replay 
> the action.
> - So we have in hdfs one (dead) process writing with another process reading.
> - But, despite the call to syncFs, we don't always see the data when we have 
> a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
> ignores the ipc errors and set the length to 0.
> - So we may miss all the writes to the last block if we try to connect to the 
> dead DN.
> hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup
> hdfs branch-2 or trunk: we should not have the issue (but not tested)
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
> The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419399#comment-13419399
 ] 

Elliott Clark commented on HBASE-6411:
--

bq.Will try to extract maps in BaseMetricsSourceImpl(s) into separate class and 
add support for MetricTags. I guess we agreed on that previously.
Thanks.  That sounds great.

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
> HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419408#comment-13419408
 ] 

Jean-Daniel Cryans commented on HBASE-6433:
---

Thanks you!

> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-6436) Netty should be moved off of snapshots.

2012-07-20 Thread Elliott Clark (JIRA)
Elliott Clark created HBASE-6436:


 Summary: Netty should be moved off of snapshots.
 Key: HBASE-6436
 URL: https://issues.apache.org/jira/browse/HBASE-6436
 Project: HBase
  Issue Type: Task
Reporter: Elliott Clark
Assignee: Elliott Clark


Netty is currently at 3.5.0.final-SNAPSHOT the final 3.5.0.Final should be used 
when possible so that repositories aren't queried when not needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nkeywal updated HBASE-6435:
---

Attachment: 6435.unfinished.patch

> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6411) Move Master Metrics to metrics 2

2012-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419411#comment-13419411
 ] 

Hadoop QA commented on HBASE-6411:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12537375/HBASE-6411-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 39 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The patch appears to cause mvn compile goal to fail.

-1 findbugs.  The patch appears to cause Findbugs (version 1.3.9) to fail.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide
  org.apache.hadoop.hbase.master.TestAssignmentManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2422//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2422//console

This message is automatically generated.

> Move Master Metrics to metrics 2
> 
>
> Key: HBASE-6411
> URL: https://issues.apache.org/jira/browse/HBASE-6411
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Elliott Clark
>Assignee: Elliott Clark
> Attachments: HBASE-6411-0.patch, HBASE-6411-1.patch, 
> HBASE-6411_concept.patch
>
>
> Move Master Metrics to metrics 2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419414#comment-13419414
 ] 

nkeywal commented on HBASE-6435:


The patch is not finished. Actually, it contains for code for the hdfs hook and 
the related test, but not the code for defining the location order from the 
file name. But as it is different from what we initially discussed, I post it 
here in case someone sees something I missed.

It does not mean it should not be fixed in hdfs as well, just that this is 
likely to be much simpler than patching the 1.0 branch...

> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419423#comment-13419423
 ] 

nkeywal commented on HBASE-6401:


HDFS-3222 is not exactly this one, but not far, and fixed on 2.0 as well.

> HBase may lose edits after a crash if used with HDFS 1.0.3 or older
> ---
>
> Key: HBASE-6401
> URL: https://issues.apache.org/jira/browse/HBASE-6401
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
> Environment: all
>Reporter: nkeywal
>Priority: Critical
> Attachments: TestReadAppendWithDeadDN.java
>
>
> This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
> hdfs jira for this.
> Context: HBase Write Ahead Log features. This is using hdfs append. If the 
> node crashes, the file that was written is read by other processes to replay 
> the action.
> - So we have in hdfs one (dead) process writing with another process reading.
> - But, despite the call to syncFs, we don't always see the data when we have 
> a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
> ignores the ipc errors and set the length to 0.
> - So we may miss all the writes to the last block if we try to connect to the 
> dead DN.
> hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup
> hdfs branch-2 or trunk: we should not have the issue (but not tested)
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
> The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419446#comment-13419446
 ] 

Todd Lipcon commented on HBASE-6435:


I'm -1 on this kind of hack going into HBase before we add the feature to HDFS. 
I agree that adding to HDFS proper means we have to wait for a release, but 
this kind of code is likely to be really fragile. Also, without HBase driving 
requirements of HDFS, it will never evolve to natively have these kind of 
features, and HBase will devolve into a mess of reflection hacks to change 
around the HDFS internals.

> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419461#comment-13419461
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

Integrated to trunk.

Thanks for the patch, binlijin.

Thanks for the review, J-D.

> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-20 Thread Aditya Kishore (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419491#comment-13419491
 ] 

Aditya Kishore commented on HBASE-6389:
---

bq. BTW what do Ri, C and Fi represent in the formula above ?

'n' is the number of tables in the cluster, *R*~i~ is the number of regions and 
*CF*~i~ is the number of column families in table 'i' ^\[1\]^.

1. [MSLAB is ON by 
default|http://hbase.apache.org/book/upgrade0.92.html#d1952e2965]

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated

[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419529#comment-13419529
 ] 

stack commented on HBASE-6389:
--

@Lars Its up to you (But since you asked, fine by me ... I like what you are 
doing though Aditya... thanks for the help).

> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxToStart &&
> (lastCountChange+interval > now || timeout > slept || count < 
> minToStart)
>   ){
> ..
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JI

[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419535#comment-13419535
 ] 

stack commented on HBASE-6433:
--

@binlijin Why not change what getRemoteIp does internally (your patch copies 
much of the body of getRemoteIp).  Is it that getRemoteIp is used in places 
where Call has not had host address set yet?

> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419551#comment-13419551
 ] 

stack commented on HBASE-6435:
--

Yeah, we should do both (I'd think that whats added to HDFS is more general 
than just this workaround scheme where local gets moved to the end of the list; 
i.e. we add being able to intercept the order returned by the NN and let a 
client-side policy alter it based on "local knowledge" if wanted Could add 
other customizations like being able to set timeout per DFSInput/OutputStream 
as you've suggested up on dev list N).  Would be sweet if the 'hack' were 
available meantime while we wait on an hdfs release.

Looking at patch, looks like inventive hackery; good on you.

Do we have to do this in both master and regionserver?  Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)

+  HFileSystem.addLocationOrderHack(conf);

Rather than have it called a reorderProxy, call it an HBaseDFSClient?  Might 
want to add more customizations while waiting on HDFS fix to arrive.


> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419552#comment-13419552
 ] 

Hudson commented on HBASE-6433:
---

Integrated in HBase-TRUNK #3155 (See 
[https://builds.apache.org/job/HBase-TRUNK/3155/])
HBASE-6433 Improve HBaseServer#getRemoteAddress by utilizing 
HBaseServer.Connection.hostAddress (binlijin) (Revision 1363905)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java


> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419561#comment-13419561
 ] 

Zhihong Ted Yu commented on HBASE-6433:
---

Looking at the code, Connection has this member:
{code}
private InetAddress addr;
{code}
But I don't see where it is assigned. The following assignment is to a local 
variable:
{code}
  InetAddress addr = socket.getInetAddress();
{code}

> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419579#comment-13419579
 ] 

stack commented on HBASE-6401:
--

@Nkeywal We need another patch on top of hdfs-3222?

> HBase may lose edits after a crash if used with HDFS 1.0.3 or older
> ---
>
> Key: HBASE-6401
> URL: https://issues.apache.org/jira/browse/HBASE-6401
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
> Environment: all
>Reporter: nkeywal
>Priority: Critical
> Attachments: TestReadAppendWithDeadDN.java
>
>
> This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
> hdfs jira for this.
> Context: HBase Write Ahead Log features. This is using hdfs append. If the 
> node crashes, the file that was written is read by other processes to replay 
> the action.
> - So we have in hdfs one (dead) process writing with another process reading.
> - But, despite the call to syncFs, we don't always see the data when we have 
> a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
> ignores the ipc errors and set the length to 0.
> - So we may miss all the writes to the last block if we try to connect to the 
> dead DN.
> hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup
> hdfs branch-2 or trunk: we should not have the issue (but not tested)
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
> The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419581#comment-13419581
 ] 

nkeywal commented on HBASE-6435:


My thinking was it could make it on a hdfs release that accepts changing public 
interfaces. I fully agree with you Todd, we need to do our homeworks and push 
hdfs to ensure that what we need is understood and makes it to a release. On 
the other hand, if I look at how it worked for much simpler stuff like JUnit 
and surefire, our changes are in theie trunk for a few months and we're still 
waiting. These things take time. But I will do my homeworks on hdfs, I promise 
(I may need your help actually). The Jira will be created next week and if I 
have enough feedback I will propose a patch.

I was also wondering if proposing natively to have interceptors would not be 
interesting for hdfs. It was available a long time in an orb called orbix and 
was great to use. But they would need to be per conf, so cannot be available 
with static stuff.

bq. Do we have to do this in both master and regionserver? Can't do it in 
HFileSystem constructor assuming it takes a Conf (or that'd be too late?)
It can be put pretty late, basically before we start a recovery process. But we 
don't want it client side, so I will check this.

bq. Rather than have it called a reorderProxy, call it an HBaseDFSClient? Might 
want to add more customizations while waiting on HDFS fix to arrive.
I've intercepted a lower level call: I'm between the DFSClient and the 
namenode. This because the DFSClient does more than just transferring calls: it 
contains some logic. Hence going in front of the namenode. But yes, I could 
make it more generic.


> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6401) HBase may lose edits after a crash if used with HDFS 1.0.3 or older

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419585#comment-13419585
 ] 

nkeywal commented on HBASE-6401:


On 1.x, yes, I think that backporting hdfs-3222 won't be enough. On 2.0, it 
seems it's ok, even if I can't find the good soul who fixed it. As I can't find 
a jira, I can create a new one & propose a fix specific to branch 1. 

> HBase may lose edits after a crash if used with HDFS 1.0.3 or older
> ---
>
> Key: HBASE-6401
> URL: https://issues.apache.org/jira/browse/HBASE-6401
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.0
> Environment: all
>Reporter: nkeywal
>Priority: Critical
> Attachments: TestReadAppendWithDeadDN.java
>
>
> This comes from a hdfs bug, fixed in some hdfs versions. I haven't found the 
> hdfs jira for this.
> Context: HBase Write Ahead Log features. This is using hdfs append. If the 
> node crashes, the file that was written is read by other processes to replay 
> the action.
> - So we have in hdfs one (dead) process writing with another process reading.
> - But, despite the call to syncFs, we don't always see the data when we have 
> a dead node. It seems to be because the call in DFSClient#updateBlockInfo 
> ignores the ipc errors and set the length to 0.
> - So we may miss all the writes to the last block if we try to connect to the 
> dead DN.
> hdfs 1.0.3, branch-1 or branch-1-win: we have the issue
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-1/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java?revision=1359853&view=markup
> hdfs branch-2 or trunk: we should not have the issue (but not tested)
> http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java?view=markup
> The attached test will fail ~50 of the time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419588#comment-13419588
 ] 

Todd Lipcon commented on HBASE-6435:


I think there's a good motivation to add these kind of APIs generally to 
DFSInputStream. In particular, I think something like the following:

public List getAvailableReplica(long pos); // return the list of 
available replicas at given file offset, in priority order
public void prioritizeReplica(Replica r); // move given replica to front of list
public void blacklistReplica(Replica r); // move replica to back of list
(or something of this sort)

The Replica API would then expose the datanode IDs (and after HDFS-3672, the 
disk ID).
So, in HBase we could simply open the file, enumerate the replicas, 
deprioritize the one on the suspected node, and move on with the normal code 
paths.

> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419608#comment-13419608
 ] 

nkeywal commented on HBASE-6435:


I understand that you don't want to expose the internal nor something like the 
DatanodeInfo. The same type of API would be useful for the outputstream, 
putting priorities on nodes (and so reusing some knowledge for the dead nodes, 
or, for the wal, remove the local writes). It simple and efficient.

With the current DFSClient implementation, a callback would ease cases like 
opening a file already opened for writing, or when a node list is cleared when 
they all failed. But may be it can be changed as well.



> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419623#comment-13419623
 ] 

Todd Lipcon commented on HBASE-6435:


bq. With the current DFSClient implementation, a callback would ease cases like 
opening a file already opened for writing, or when a node list is cleared when 
they all failed. But may be it can be changed as well.

Can you explain further what you mean here? What would you use these callbacks 
for?

> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put the hook static. None of these solution is very 
> clean. 
> - Adding a proxy allows to put all the code in HBase, simplifying dependency 
> management.
> Nevertheless, it would be better to have this in HDFS. But this solution 
> allows to target the last version only, and this could allow minimal 
> interface changes such as non static methods.
> Moreover, writing the blocks to the non local DN would be an even better 
> solution long term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6321:
--

Attachment: HBASE-6321-0.94.patch

Had a stab at this. What I figured is that the getting of UUIDs was done 
outside of {{ReplicationZookeeper}} so it was missing the functionalities from 
that class (you can also see the feature envy that was going on there).

I refactored the ugly UUID stuff in {{ReplicationSource.run}} into 
{{ReplicationZookeeper.getPeerUUID}}. There I needed to handle the session 
expiration issues so I refactored that from another method into 
{{reconnectPeer}}. Now that the issue is handled the possibility of a null UUID 
remained if the peer wasn't reachable so I added a loop in 
{{ReplicationSource}}.

Finally I saw that we were doing the UUID dance in {{ReplicationSource.init}} 
for the current cluster so I pushed that to 
{{ReplicationZookeeper.getUUIDForCluster}} and refactored {{getPeerUUID}} to 
use it.

The code should be clearer a more reliable.

> ReplicationSource dies reading the peer's id
> 
>
> Key: HBASE-6321
> URL: https://issues.apache.org/jira/browse/HBASE-6321
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6321-0.94.patch
>
>
> This is what I saw:
> {noformat}
> 2012-07-01 05:04:01,638 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
> source 8 because an error occurred: Could not read peer's cluster id
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /va1-backup/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
> at 
> org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
> {noformat}
> The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6321:
--

Fix Version/s: (was: 0.90.8)
 Assignee: Jean-Daniel Cryans

Assigning this to me and removing the 0.90 target since I found out that that 
part of the code was added in 0.92

> ReplicationSource dies reading the peer's id
> 
>
> Key: HBASE-6321
> URL: https://issues.apache.org/jira/browse/HBASE-6321
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6321-0.94.patch
>
>
> This is what I saw:
> {noformat}
> 2012-07-01 05:04:01,638 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
> source 8 because an error occurred: Could not read peer's cluster id
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /va1-backup/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
> at 
> org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
> {noformat}
> The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6321) ReplicationSource dies reading the peer's id

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419633#comment-13419633
 ] 

Jean-Daniel Cryans commented on HBASE-6321:
---

Oh I forgot to mention that I ran TestReplication/Source/Manager 2 times each 
and they all passed.

> ReplicationSource dies reading the peer's id
> 
>
> Key: HBASE-6321
> URL: https://issues.apache.org/jira/browse/HBASE-6321
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.2
>
> Attachments: HBASE-6321-0.94.patch
>
>
> This is what I saw:
> {noformat}
> 2012-07-01 05:04:01,638 ERROR 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Closing 
> source 8 because an error occurred: Could not read peer's cluster id
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired for /va1-backup/hbaseid
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1021)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:154)
> at 
> org.apache.hadoop.hbase.zookeeper.ZKUtil.checkExists(ZKUtil.java:259)
> at 
> org.apache.hadoop.hbase.zookeeper.ClusterId.readClusterIdZNode(ClusterId.java:61)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:253)
> {noformat}
> The session should just be reopened.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6435) Reading WAL files after a recovery leads to time lost in HDFS timeouts when using dead datanodes

2012-07-20 Thread nkeywal (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419646#comment-13419646
 ] 

nkeywal commented on HBASE-6435:


If I can to keep the existing interface


Today, when you open a file, there is a call to a datanode if the file is also 
opened for writing somewhere. In HBase, we want the priorities to be taken into 
account during this opening, as we have a guess that one of these datanode may 
be dead.

So either I register a callback that the DFSClient will call before using its 
list, either I change the 'open' interface to add the possibility to provide 
the list of replicas. Same thing for chooseDataNode called from blockSeekTo: 
even if we have a list at the beginning, this list is recreated during a read 
as a part of the retry process (in case the NN discovered new replicas on new 
datanodes).

if we put a callback like

We would offer this service.
{noformat}
class  ReplicaSet {
  public List getAvailableReplica(long pos); // return the list of 
available replicas at given file offset, in priority order
  public void prioritizeReplica(Replica r); // move given replica to front of 
list
  public void blacklistReplica(Replica r); // move replica to back of list
}
{noformat}


The client would need to implement this interface:
{noformat}
// Implement this interface and provide it to the DFSClient during its 
construction to manage the replica ordering
interface OrganizeReplicaSet{
 void organize(String fileName, ReplicaSet rs); 
}
{noformat}

And the DFSClient code would become:
{noformat}
LocatedBlocks callGetBlockLocations(ClientProtocol namenode,
  String src, long start, long length) throws IOException {
try {
LocatedBlocks lbs = namenode.getBlockLocations(src, start, length);
if (organizeReplicaSet != null){
ReplicaSet rs = LocatedBlocks.getAsReplicaSet()
try {
organizeReplicaSet.organize(src, rs);
}catch (Throwable t){
throw new IOException("ClientBlockReordorer failed. 
class="+reorderer.getClass(), t);
}
return new LocatedBlocks(rs);
} else
  return lbs;
{noformat}

This is called from the DFSInputStream constructor in openInfo today.

In real life I would try to use the class ReplicaSet as an interface on the 
internal LocatedBlock(s) to limit the number of objects created. The callback 
could also be given as a parameter to the DFSInputStream constructor if a there 
is a specific rule to apply...


> Reading WAL files after a recovery leads to time lost in HDFS timeouts when 
> using dead datanodes
> 
>
> Key: HBASE-6435
> URL: https://issues.apache.org/jira/browse/HBASE-6435
> Project: HBase
>  Issue Type: Improvement
>  Components: master, regionserver
>Affects Versions: 0.96.0
>Reporter: nkeywal
>Assignee: nkeywal
> Attachments: 6435.unfinished.patch
>
>
> HBase writes a Write-Ahead-Log to revover from hardware failure.
> This log is written with 'append' on hdfs.
> Through ZooKeeper, HBase gets informed usually in 30s that it should start 
> the recovery process. 
> This means reading the Write-Ahead-Log to replay the edits on the other 
> servers.
> In standards deployments, HBase process (regionserver) are deployed on the 
> same box as the datanodes.
> It means that when the box stops, we've actually lost one of the edits, as we 
> lost both the regionserver and the datanode.
> As HDFS marks a node as dead after ~10 minutes, it appears as available when 
> we try to read the blocks to recover. As such, we are delaying the recovery 
> process by 60 seconds as the read will usually fail with a socket timeout. If 
> the file is still opened for writing, it adds an extra 20s + a risk of losing 
> edits if we connect with ipc to the dead DN.
> Possible solutions are:
> - shorter dead datanodes detection by the NN. Requires a NN code change.
> - better dead datanodes management in DFSClient. Requires a DFS code change.
> - NN customisation to write the WAL files on another DN instead of the local 
> one.
> - reordering the blocks returned by the NN on the client side to put the 
> blocks on the same DN as the dead RS at the end of the priority queue. 
> Requires a DFS code change or a kind of workaround.
> The solution retained is the last one. Compared to what was discussed on the 
> mailing list, the proposed patch will not modify HDFS source code but adds a 
> proxy. This for two reasons:
> - Some HDFS functions managing block orders are static 
> (MD5MD5CRC32FileChecksum). Implementing the hook in the DFSClient would 
> require to implement partially the fix, change the DFS interface to make this 
> function non static, or put 

[jira] [Commented] (HBASE-5985) TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419648#comment-13419648
 ] 

Hudson commented on HBASE-5985:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-5985 TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0 (Revision 
1363561)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/client/TestMetaMigrationRemovingHTD.java


> TestMetaMigrationRemovingHTD failed with HADOOP 2.0.0
> -
>
> Key: HBASE-5985
> URL: https://issues.apache.org/jira/browse/HBASE-5985
> Project: HBase
>  Issue Type: Test
>  Components: test
>Affects Versions: 0.96.0
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: hbase-5985.patch
>
>
> ---
> Test set: org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 3.448 sec <<< 
> FAILURE!
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD  Time elapsed: 0 
> sec  <<< ERROR!
> java.io.IOException: Failed put; errcode=1
> at 
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.doFsCommand(TestMetaMigrationRemovingHTD.java:124)
> at 
> org.apache.hadoop.hbase.client.TestMetaMigrationRemovingHTD.setUpBeforeClass(TestMetaMigrationRemovingHTD.java:80)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> at 
> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6397) [hbck] print out bulk load commands for sidelined regions if necessary

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419654#comment-13419654
 ] 

Hudson commented on HBASE-6397:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6397 [hbck] print out bulk load commands for sidelined regions if 
necessary (Revision 1362247)

 Result = FAILURE
jxiang : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java


> [hbck] print out bulk load commands for sidelined regions if necessary
> --
>
> Key: HBASE-6397
> URL: https://issues.apache.org/jira/browse/HBASE-6397
> Project: HBase
>  Issue Type: Bug
>  Components: hbck
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
>Priority: Trivial
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6397-trunk.patch
>
>
> It's better to print out in the log the command line to bulk load back 
> sidelined regions, if any.
> Separate it out from HBASE-6392 since it is a different issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6426) Add Hadoop 2.0.x profile to 0.92+

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419647#comment-13419647
 ] 

Hudson commented on HBASE-6426:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6426 Add Hadoop 2.0.x profile to 0.92+ (Revision 1363211)

 Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/pom.xml


> Add Hadoop 2.0.x profile to 0.92+
> -
>
> Key: HBASE-6426
> URL: https://issues.apache.org/jira/browse/HBASE-6426
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.1
>
> Attachments: 6426.txt
>
>
> 0.96 already has a Hadoop-2.0 build profile. Let's add this to 0.92 and 0.94 
> as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6389) Modify the conditions to ensure that Master waits for sufficient number of Region Servers before starting region assignments

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419649#comment-13419649
 ] 

Hudson commented on HBASE-6389:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6389 revert (Revision 1363193)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/regionserver/TestRSKilledWhenMasterInitializing.java


> Modify the conditions to ensure that Master waits for sufficient number of 
> Region Servers before starting region assignments
> 
>
> Key: HBASE-6389
> URL: https://issues.apache.org/jira/browse/HBASE-6389
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Affects Versions: 0.94.0, 0.96.0
>Reporter: Aditya Kishore
>Assignee: Aditya Kishore
>Priority: Critical
> Fix For: 0.96.0, 0.94.2
>
> Attachments: HBASE-6389_trunk.patch, HBASE-6389_trunk.patch, 
> HBASE-6389_trunk.patch, org.apache.hadoop.hbase.TestZooKeeper-output.txt, 
> testReplication.jstack
>
>
> Continuing from HBASE-6375.
> It seems I was mistaken in my assumption that changing the value of 
> "hbase.master.wait.on.regionservers.mintostart" to a sufficient number (from 
> default of 1) can help prevent assignment of all regions to one (or a small 
> number of) region server(s).
> While this was the case in 0.90.x and 0.92.x, the behavior has changed in 
> 0.94.0 onwards to address HBASE-4993.
> From 0.94.0 onwards, Master will proceed immediately after the timeout has 
> lapsed, even if "hbase.master.wait.on.regionservers.mintostart" has not 
> reached.
> Reading the current conditions of waitForRegionServers() clarifies it
> {code:title=ServerManager.java (trunk rev:1360470)}
> 
> 581 /**
> 582  * Wait for the region servers to report in.
> 583  * We will wait until one of this condition is met:
> 584  *  - the master is stopped
> 585  *  - the 'hbase.master.wait.on.regionservers.timeout' is reached
> 586  *  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
> 587  *region servers is reached
> 588  *  - the 'hbase.master.wait.on.regionservers.mintostart' is reached 
> AND
> 589  *   there have been no new region server in for
> 590  *  'hbase.master.wait.on.regionservers.interval' time
> 591  *
> 592  * @throws InterruptedException
> 593  */
> 594 public void waitForRegionServers(MonitoredTask status)
> 595 throws InterruptedException {
> 
> 
> 612   while (
> 613 !this.master.isStopped() &&
> 614   slept < timeout &&
> 615   count < maxToStart &&
> 616   (lastCountChange+interval > now || count < minToStart)
> 617 ){
> 
> {code}
> So with the current conditions, the wait will end as soon as timeout is 
> reached even lesser number of RS have checked-in with the Master and the 
> master will proceed with the region assignment among these RSes alone.
> As mentioned in 
> -[HBASE-4993|https://issues.apache.org/jira/browse/HBASE-4993?focusedCommentId=13237196#comment-13237196]-,
>  and I concur, this could have disastrous effect in large cluster especially 
> now that MSLAB is turned on.
> To enforce the required quorum as specified by 
> "hbase.master.wait.on.regionservers.mintostart" irrespective of timeout, 
> these conditions need to be modified as following
> {code:title=ServerManager.java}
> ..
>   /**
>* Wait for the region servers to report in.
>* We will wait until one of this condition is met:
>*  - the master is stopped
>*  - the 'hbase.master.wait.on.regionservers.maxtostart' number of
>*region servers is reached
>*  - the 'hbase.master.wait.on.regionservers.mintostart' is reached AND
>*   there have been no new region server in for
>*  'hbase.master.wait.on.regionservers.interval' time AND
>*   the 'hbase.master.wait.on.regionservers.timeout' is reached
>*
>* @throws InterruptedException
>*/
>   public void waitForRegionServers(MonitoredTask status)
> ..
> ..
> int minToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.mintostart", 1);
> int maxToStart = this.master.getConfiguration().
> getInt("hbase.master.wait.on.regionservers.maxtostart", 
> Integer.MAX_VALUE);
> if (maxToStart < minToStart) {
>   maxToStart = minToStart;
> }
> ..
> ..
> while (
>   !this.master.isStopped() &&
> count < maxTo

[jira] [Commented] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419651#comment-13419651
 ] 

Hudson commented on HBASE-6406:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6406 Remove TestReplicationPeer (Revision 1363213)

 Result = FAILURE
larsh : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/replication/TestReplicationPeer.java


> TestReplicationPeer.testResetZooKeeperSession and 
> TestZooKeeper.testClientSessionExpired fail frequently
> 
>
> Key: HBASE-6406
> URL: https://issues.apache.org/jira/browse/HBASE-6406
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.1
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack
>
>
> Looking back through the 0.94 test runs these two tests accounted for 11 of 
> 34 failed tests.
> They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6319) ReplicationSource can call terminate on itself and deadlock

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419650#comment-13419650
 ] 

Hudson commented on HBASE-6319:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> ReplicationSource can call terminate on itself and deadlock
> ---
>
> Key: HBASE-6319
> URL: https://issues.apache.org/jira/browse/HBASE-6319
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.94.1
>
> Attachments: HBASE-6319-0.92.patch
>
>
> In a few places in the ReplicationSource code calls terminate on itself which 
> is a problem since in terminate() we wait on that thread to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4956) Control direct memory buffer consumption by HBaseClient

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419652#comment-13419652
 ] 

Hudson commented on HBASE-4956:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-4956 Control direct memory buffer consumption by HBaseClient (Bob 
Copeland) (Revision 1363533)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/client/Result.java


> Control direct memory buffer consumption by HBaseClient
> ---
>
> Key: HBASE-4956
> URL: https://issues.apache.org/jira/browse/HBASE-4956
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Yu
>Assignee: Bob Copeland
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 4956.txt, thread_get.rb
>
>
> As Jonathan explained here 
> https://groups.google.com/group/asynchbase/browse_thread/thread/c45bc7ba788b2357?pli=1
>  , standard hbase client inadvertently consumes large amount of direct memory.
> We should consider using netty for NIO-related tasks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6392) UnknownRegionException blocks hbck from sideline big overlap regions

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419657#comment-13419657
 ] 

Hudson commented on HBASE-6392:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6392 UnknownRegionException blocks hbck from sideline big overlap 
regions (Revision 1363202)

 Result = FAILURE
jxiang : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java


> UnknownRegionException blocks hbck from sideline big overlap regions
> 
>
> Key: HBASE-6392
> URL: https://issues.apache.org/jira/browse/HBASE-6392
> Project: HBase
>  Issue Type: Bug
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
> Attachments: 6392-0.90.patch, 6392-trunk.patch, 6392-trunk_v2.patch, 
> 6392_0.92.patch
>
>
> Before sidelining a big overlap region, hbck tries to close it and offline it 
> at first.  However, sometimes, it throws NotServingRegion or 
> UnknownRegionException.
> It could be because the region is not open/assigned at all, or some other 
> issue.
> We should figure out why and fix it.
> By the way, it's better to print out in the log the command line to bulk load 
> back sidelined regions, if any. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6420) Gracefully shutdown logsyncer

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419653#comment-13419653
 ] 

Hudson commented on HBASE-6420:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6420 Gracefully shutdown logsyncer (Revision 1363416)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java


> Gracefully shutdown logsyncer
> -
>
> Key: HBASE-6420
> URL: https://issues.apache.org/jira/browse/HBASE-6420
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Reporter: Jimmy Xiang
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: 6420-trunk.patch
>
>
> Currently, in closing a HLog, logSyncerThread is interrupted. logSyncer could 
> be in the middle to sync the writer.  We should avoid interrupting the sync.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6325) [replication] Race in ReplicationSourceManager.init can initiate a failover even if the node is alive

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419658#comment-13419658
 ] 

Hudson commented on HBASE-6325:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6319  ReplicationSource can call terminate on itself and deadlock
HBASE-6325  [replication] Race in ReplicationSourceManager.init can initiate a 
failover even if the node is alive (Revision 1363570)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java


> [replication] Race in ReplicationSourceManager.init can initiate a failover 
> even if the node is alive
> -
>
> Key: HBASE-6325
> URL: https://issues.apache.org/jira/browse/HBASE-6325
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.6, 0.92.1, 0.94.0
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6325-0.92-v2.patch, HBASE-6325-0.92.patch
>
>
> Yet another bug found during the leap second madness, it's possible to miss 
> the registration of new region servers so that in 
> ReplicationSourceManager.init we start the failover of a live and replicating 
> region server. I don't think there's data loss but the RS that's being failed 
> over will die on:
> {noformat}
> 2012-07-01 06:25:15,604 FATAL 
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server 
> sv4r23s48,10304,1341112194623: Writing replication status
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for 
> /hbase/replication/rs/sv4r23s48,10304,1341112194623/4/sv4r23s48%2C10304%2C1341112194623.1341112195369
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1246)
> at 
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:372)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:655)
> at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:697)
> at 
> org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:470)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:607)
> at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:368)
> {noformat}
> It seems to me that just refreshing {{otherRegionServers}} after getting the 
> list of {{currentReplicators}} would be enough to fix this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6382) Upgrade Jersey to 1.8 to match Hadoop 1 and 2

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419656#comment-13419656
 ] 

Hudson commented on HBASE-6382:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-6382 Upgrade Jersey to 1.8 to match Hadoop 1 and 2 (David S. Wang) 
(Revision 1362308)

 Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/pom.xml


> Upgrade Jersey to 1.8 to match Hadoop 1 and 2
> -
>
> Key: HBASE-6382
> URL: https://issues.apache.org/jira/browse/HBASE-6382
> Project: HBase
>  Issue Type: Improvement
>  Components: rest
>Affects Versions: 0.90.7, 0.92.2, 0.96.0, 0.94.2
>Reporter: David S. Wang
>Assignee: David S. Wang
> Fix For: 0.90.7, 0.92.2, 0.96.0, 0.94.1
>
> Attachments: HBASE-6382-trunk.patch
>
>
> Upgrade Jersey dependency from 1.4 to 1.8 to match Hadoop dependencies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5966) MapReduce based tests broken on Hadoop 2.0.0-alpha

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419655#comment-13419655
 ] 

Hudson commented on HBASE-5966:
---

Integrated in HBase-0.94-security #44 (See 
[https://builds.apache.org/job/HBase-0.94-security/44/])
HBASE-5966 MapReduce based tests broken on Hadoop 2.0.0-alpha (Gregory 
Chanan) (Revision 1363586)

 Result = FAILURE
jxiang : 
Files : 
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java


> MapReduce based tests broken on Hadoop 2.0.0-alpha
> --
>
> Key: HBASE-5966
> URL: https://issues.apache.org/jira/browse/HBASE-5966
> Project: HBase
>  Issue Type: Bug
>  Components: mapred, mapreduce, test
>Affects Versions: 0.94.0, 0.96.0
> Environment: Hadoop 2.0.0-alpha-SNAPSHOT, HBase 0.94.0-SNAPSHOT, 
> Ubuntu 12.04 LTS (GNU/Linux 3.2.0-24-generic x86_64)
>Reporter: Andrew Purtell
>Assignee: Jimmy Xiang
> Fix For: 0.96.0, 0.94.1
>
> Attachments: HBASE-5966-1.patch, HBASE-5966-94.patch, 
> HBASE-5966.patch, hbase-5966.patch
>
>
> Some fairly recent change in Hadoop 2.0.0-alpha has broken our MapReduce test 
> rigging. Below is a representative error, can be easily reproduced with:
> {noformat}
> mvn -PlocalTests -Psecurity \
>   -Dhadoop.profile=23 -Dhadoop.version=2.0.0-SNAPSHOT \
>   clean test \
>   -Dtest=org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> {noformat}
> And the result:
> {noformat}
> ---
>  T E S T S
> ---
> Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> ---
> Test set: org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
> ---
> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 54.292 sec 
> <<< FAILURE!
> testMultiRegionTable(org.apache.hadoop.hbase.mapreduce.TestTableMapReduce)  
> Time elapsed: 21.935 sec  <<< ERROR!
> java.lang.reflect.UndeclaredThrowableException
>   at 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl.unwrapAndThrowException(YarnRemoteExceptionPBImpl.java:135)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl.getNewApplication(ClientRMProtocolPBClientImpl.java:134)
>   at 
> org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:183)
>   at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:216)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:339)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1226)
>   at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1223)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:416)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1223)
>   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1244)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.runTestOnTable(TestTableMapReduce.java:151)
>   at 
> org.apache.hadoop.hbase.mapreduce.TestTableMapReduce.testMultiRegionTable(TestTableMapReduce.java:129)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:616)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
>   at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:47)
>   at org.junit.rules.RunRules.evaluate(RunRules.java:18)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:

[jira] [Resolved] (HBASE-6310) -ROOT- corruption when .META. is using the old encoding scheme

2012-07-20 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-6310.
---

   Resolution: Invalid
Fix Version/s: (was: 0.94.2)
   (was: 0.96.0)

I'm resolving this as invalid, I was thrown in the wrong direction by what I 
thought were old/new .META. rows (they in fact never changed) whereas it was a 
.META. region from almost 3 years ago that was brought back to life. It could 
have been something like HBASE-6417 that happened, but since I don't have those 
logs anymore I can't be 100% sure until I reproduce the issue.

> -ROOT- corruption when .META. is using the old encoding scheme
> --
>
> Key: HBASE-6310
> URL: https://issues.apache.org/jira/browse/HBASE-6310
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 0.94.0
>Reporter: Jean-Daniel Cryans
>Priority: Blocker
>
> We're still working the on the root cause here, but after the leap second 
> armageddon we had a hard time getting our 0.94 cluster back up. This is what 
> we saw in the logs until the master died by itself:
> {noformat}
> 2012-07-01 23:01:52,149 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> locateRegionInMeta parentTable=-ROOT-,
> metaLocation={region=-ROOT-,,0.70236052, hostname=sfor3s28,
> port=10304}, attempt=16 of 100 failed; retrying after sleep of 32000
> because: HRegionInfo was null or empty in -ROOT-,
> row=keyvalues={.META.,,1259448304806/info:server/1341124914705/Put/vlen=14/ts=0,
> .META.,,1259448304806/info:serverstartcode/1341124914705/Put/vlen=8/ts=0}
> {noformat}
> (it's strage that we retry this)
> This was really misleading because I could see the regioninfo in a scan:
> {noformat}
> hbase(main):002:0> scan '-ROOT-'
> ROW   COLUMN+CELL
>  .META.,,1column=info:regioninfo,
> timestamp=1331755381142, value={NAME => '.META.,,1', STARTKEY => '',
> ENDKEY => '', ENCODED => 1028785192,}
>  .META.,,1column=info:server,
> timestamp=1341183448693, value=sfor3s40:10304
>  .META.,,1
> column=info:serverstartcode, timestamp=1341183448693,
> value=1341183444689
>  .META.,,1column=info:v,
> timestamp=1331755419291, value=\x00\x00
>  .META.,,1259448304806column=info:server,
> timestamp=1341124914705, value=sfor3s24:10304
>  .META.,,1259448304806
> column=info:serverstartcode, timestamp=1341124914705,
> value=1341124455863
> {noformat}
> Except that the devil is in the details, ".META.,,1" is not 
> ".META.,,1259448304806". Basically something writes to .META. by directly 
> creating the row key without caring if the row is in the old format. I did a 
> deleteall in the shell and it fixed the issue... until some time later it was 
> stuck again because the edits reappeared (still not sure why). This time the 
> PostOpenDeployTasksThread were stuck in the RS trying to update .META. but 
> there was no logging (saw it with a jstack). I deleted the row again to make 
> it work.
> I'm marking this as a blocker against 0.94.2 since we're trying to get 0.94.1 
> out, but I wouldn't recommend upgrading to 0.94 if your cluster was created 
> before 0.89

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6228) Fixup daughters twice cause daughter region assigned twice

2012-07-20 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419691#comment-13419691
 ] 

Jimmy Xiang commented on HBASE-6228:


I'd like to fix this in HBASE-6381 by making sure SSH blocks on 
AM.processServerShutdown until the master has joined the cluster, and fixed 
missing daughters.

> Fixup daughters twice  cause daughter region assigned twice
> ---
>
> Key: HBASE-6228
> URL: https://issues.apache.org/jira/browse/HBASE-6228
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: chunhui shen
>Assignee: chunhui shen
> Fix For: 0.96.0
>
> Attachments: HBASE-6228.patch, HBASE-6228v2.patch, 
> HBASE-6228v2.patch, HBASE-6228v3.patch, HBASE-6228v4.patch
>
>
> First, how fixup daughters twice happen?
> 1.we will fixupDaughters at the last of HMaster#finishInitialization
> 2.ServerShutdownHandler will fixupDaughters when reassigning region through 
> ServerShutdownHandler#processDeadRegion
> When fixupDaughters, we will added daughters to .META., but it coudn't 
> prevent the above case, because FindDaughterVisitor.
> The detail is as the following:
> Suppose region A is a splitted parent region, and its daughter region B is 
> missing
> 1.First, ServerShutdownHander thread fixup daughter, so add daughter region B 
> to .META. with serverName=null, and assign the daughter.
> 2.Then, Master's initialization thread will also find the daughter region B 
> is missing and assign it. It is because FindDaughterVisitor consider daughter 
> is missing if its serverName=null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6433) Improve HBaseServer#getRemoteAddress by utilizing HBaseServer.Connection.hostAddress

2012-07-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419692#comment-13419692
 ] 

Hudson commented on HBASE-6433:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #101 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/101/])
HBASE-6433 Improve HBaseServer#getRemoteAddress by utilizing 
HBaseServer.Connection.hostAddress (binlijin) (Revision 1363905)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/HBaseServer.java


> Improve HBaseServer#getRemoteAddress by utilizing 
> HBaseServer.Connection.hostAddress
> 
>
> Key: HBASE-6433
> URL: https://issues.apache.org/jira/browse/HBASE-6433
> Project: HBase
>  Issue Type: Improvement
>Reporter: binlijin
>Assignee: binlijin
>Priority: Minor
> Fix For: 0.96.0
>
> Attachments: 6433-getRemoteAddress-trunk.txt, HBASE-6433-90.patch, 
> HBASE-6433-92.patch, HBASE-6433-94.patch, HBASE-6433-trunk.patch
>
>
> Currently, HBaseServer#getRemoteAddress would call getRemoteIp(), leading to 
> call.connection.socket.getInetAddress().
> The host address is actually stored in HBaseServer.Connection.hostAddress 
> field. We don't need to go through Socket to get this information.
> Without this patch it costs 4000ns, with this patch it costs 1600ns

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419695#comment-13419695
 ] 

Lars Hofhansl commented on HBASE-6428:
--

That is an excellent point.
Should also think about HBASE-6427 with this in mind.

> Pluggable Compaction policies
> -
>
> Key: HBASE-6428
> URL: https://issues.apache.org/jira/browse/HBASE-6428
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>
> For some usecases is useful to allow more control over how KVs get compacted.
> For example one could envision storing old versions of a KV separate HFiles, 
> which then rarely have to be touched/cached by queries querying for new data.
> In addition these date ranged HFile can be easily used for backups while 
> maintaining historical data.
> This would be a major change, allowing compactions to provide multiple 
> targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419696#comment-13419696
 ] 

Lars Hofhansl commented on HBASE-5547:
--

+1 on patch. Ted pinged me, that he is out already.
Since this is a Salesforce patch, I should commit it anyway.

Will do so as soon as I get to it.

Jesse, do you have a feeling about how different a 0.94 patch would be?

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Jesse Yates (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419697#comment-13419697
 ] 

Jesse Yates commented on HBASE-5547:


@Lars I don't think it would be all that different. I'll take a crack next week 
(after dealing with the next round of HBASE-6055 stuff).

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419710#comment-13419710
 ] 

Lars Hofhansl commented on HBASE-6428:
--

As it stands the Coprocessor could at best function as a post filter. In this 
case that might be good enough, though. preCompact could skip the normal 
processing and use the passed internalScanner and then write to multiple store 
files; for example based on TS.

HBASE-6427 is different in that there I want "unfilter" something, so any 
coprocessor hooks would have to be deeper in the stack.

> Pluggable Compaction policies
> -
>
> Key: HBASE-6428
> URL: https://issues.apache.org/jira/browse/HBASE-6428
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>
> For some usecases is useful to allow more control over how KVs get compacted.
> For example one could envision storing old versions of a KV separate HFiles, 
> which then rarely have to be touched/cached by queries querying for new data.
> In addition these date ranged HFile can be easily used for backups while 
> maintaining historical data.
> This would be a major change, allowing compactions to provide multiple 
> targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419722#comment-13419722
 ] 

Lars Hofhansl commented on HBASE-5659:
--

Without parent the revised test fails every time. With parent it fails rarely.
I do not know what the issue is.

This only happens when the test does heavy flushing (during the course of the 
test > 1000 flushes happen. So the problem might be there.

I can offer to disable the test or to reduce the number of flushes for now, but 
of course that pastes over the problem.

I also would not mind if somebody else has a look at the test and check whether 
test logic itself is flawed.

> TestAtomicOperation.testMultiRowMutationMultiThreads is still failing 
> occasionally
> --
>
> Key: HBASE-5659
> URL: https://issues.apache.org/jira/browse/HBASE-5659
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.96.0
>
>
> See run here: 
> https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/
> {quote}
> 2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): 
> Storescanner.peek() is changed where before = 
> rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = 
> rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0
> 2012-03-27 04:36:12,629 INFO  [Thread-121] regionserver.HRegion(1558): 
> Finished memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region 
> testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, 
> sequenceid=7927, compaction requested=true
> 2012-03-27 04:36:12,629 DEBUG [Thread-126] 
> regionserver.TestAtomicOperation$2(362): flushing
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): 
> Started memstore flush for 
> testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., current region 
> memstore size 1.9k
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): 
> Finished snapshotting 
> testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., commencing wait 
> for mvcc, flushsize=1968
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): 
> Finished snapshotting, commencing flushing stores
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating 
> file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
>  with permission=rwxrwxrwx
> 2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): 
> Initialized with CacheConfig:enabled [cacheDataOnRead=true] 
> [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
> [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
> 2012-03-27 04:36:12,631 INFO  [Thread-126] 
> regionserver.StoreFile$Writer(997): Delete Family Bloom filter type for 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57:
>  CompoundBloomFilterWriter
> 2012-03-27 04:36:12,632 INFO  [Thread-126] 
> regionserver.StoreFile$Writer(1220): NO General Bloom and NO DeleteFamily was 
> added to HFile 
> (/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57)
>  
> 2012-03-27 04:36:12,632 INFO  [Thread-126] regionserver.Store(770): Flushed , 
> sequenceid=7934, memsize=1.9k, into tmp file 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
> 2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming 
> flushed file at 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
>  to 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57
> 2012-03-27 04:36:1

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419723#comment-13419723
 ] 

Lars Hofhansl commented on HBASE-5547:
--

I also verified in a real setup, that an HFile is indeed archived and (by 
default) removed after 5 mins. Was thrown off first, because the 
table/region/cf directory is not removed when empty.
Also made sure I can create/drop tables and then .META. is backed up correctly.

So still +1 :)


> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-5659) TestAtomicOperation.testMultiRowMutationMultiThreads is still failing occasionally

2012-07-20 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5659:
-

Fix Version/s: 0.94.2

> TestAtomicOperation.testMultiRowMutationMultiThreads is still failing 
> occasionally
> --
>
> Key: HBASE-5659
> URL: https://issues.apache.org/jira/browse/HBASE-5659
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Lars Hofhansl
>Priority: Minor
> Fix For: 0.96.0, 0.94.2
>
>
> See run here: 
> https://builds.apache.org/job/PreCommit-HBASE-Build/1318//testReport/org.apache.hadoop.hbase.regionserver/TestAtomicOperation/testMultiRowMutationMultiThreads/
> {quote}
> 2012-03-27 04:36:12,627 DEBUG [Thread-118] regionserver.StoreScanner(499): 
> Storescanner.peek() is changed where before = 
> rowB/colfamily11:qual1/7202/Put/vlen=6/ts=7922,and after = 
> rowB/colfamily11:qual1/7199/DeleteColumn/vlen=0/ts=0
> 2012-03-27 04:36:12,629 INFO  [Thread-121] regionserver.HRegion(1558): 
> Finished memstore flush of ~2.9k/2952, currentsize=1.6k/1640 for region 
> testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81. in 14ms, 
> sequenceid=7927, compaction requested=true
> 2012-03-27 04:36:12,629 DEBUG [Thread-126] 
> regionserver.TestAtomicOperation$2(362): flushing
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1426): 
> Started memstore flush for 
> testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., current region 
> memstore size 1.9k
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1474): 
> Finished snapshotting 
> testtable,,1332822963417.7cd30e219714cfc5e91f69def66e7f81., commencing wait 
> for mvcc, flushsize=1968
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] regionserver.HRegion(1484): 
> Finished snapshotting, commencing flushing stores
> 2012-03-27 04:36:12,630 DEBUG [Thread-126] util.FSUtils(153): Creating 
> file=/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
>  with permission=rwxrwxrwx
> 2012-03-27 04:36:12,631 DEBUG [Thread-126] hfile.HFileWriterV2(143): 
> Initialized with CacheConfig:enabled [cacheDataOnRead=true] 
> [cacheDataOnWrite=false] [cacheIndexesOnWrite=false] 
> [cacheBloomsOnWrite=false] [cacheEvictOnClose=false] [cacheCompressed=false]
> 2012-03-27 04:36:12,631 INFO  [Thread-126] 
> regionserver.StoreFile$Writer(997): Delete Family Bloom filter type for 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57:
>  CompoundBloomFilterWriter
> 2012-03-27 04:36:12,632 INFO  [Thread-126] 
> regionserver.StoreFile$Writer(1220): NO General Bloom and NO DeleteFamily was 
> added to HFile 
> (/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57)
>  
> 2012-03-27 04:36:12,632 INFO  [Thread-126] regionserver.Store(770): Flushed , 
> sequenceid=7934, memsize=1.9k, into tmp file 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
> 2012-03-27 04:36:12,632 DEBUG [Thread-126] regionserver.Store(795): Renaming 
> flushed file at 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/.tmp/61954619003e469baf1a34be5ff2ec57
>  to 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57
> 2012-03-27 04:36:12,634 INFO  [Thread-126] regionserver.Store(818): Added 
> /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/trunk/target/test-data/b9091c3c-961e-4035-850a-83ad14d517cc/TestAtomicOperationtestMultiRowMutationMultiThreads/testtable/7cd30e219714cfc5e91f69def66e7f81/colfamily11/61954619003e469baf1a34be5ff2ec57,
>  entries=12, sequenceid=7934, filesize=1.3k
> 2012-03-27 04:36:12,642 DEBUG [Thread-118] 
> regionserver.TestAtomicOperation$2(392): []
> Exception in thread "Thread-118" junit.framework.AssertionF

[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419724#comment-13419724
 ] 

Lars Hofhansl commented on HBASE-5547:
--

Ok. On more question I asked just now on RB:

>From Matteo:
{quote}
MasterFileSystem contains deleteRegion() and deleteTable() that calls 
fs.delete() with the recursive flag on.
This two methods get called by DeleteTableHandler (drop table).
In a backup/snapshot situation we want to keep the regions/hfiles.
{quote}

My follow up question:
{quote}
I find that deleteRegion() was addressed, but not deleteTable().

That means if a table is dropped the HFiles would be deleted and not 
archived.
So it seems we should either:
- also delete the table's archive directory (since it would be 
incomplete anyway).
- archive all the HFile before deleting them.

What do you think Jesse?
{quote}



> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419725#comment-13419725
 ] 

Lars Hofhansl edited comment on HBASE-5547 at 7/21/12 2:49 AM:
---

Or is it that all regions are first deleted anyway, and only then the 
deleteTable is called (in DeleteTableHandler.handleTableOperation)?

Edit: Spelling

  was (Author: lhofhansl):
Or is it that all region are first deleted anyway, and only then the 
deleteTable is called (in DeleteTableHandler.handleTableOperation)
  
> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419725#comment-13419725
 ] 

Lars Hofhansl commented on HBASE-5547:
--

Or is it that all region are first deleted anyway, and only then the 
deleteTable is called (in DeleteTableHandler.handleTableOperation)

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5954) Allow proper fsync support for HBase

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419738#comment-13419738
 ] 

Lars Hofhansl commented on HBASE-5954:
--

I think the API going multiple ways (these are not mutually exclusive):

# hsync for HFiles (would guard compactions, etc, very lightweight), enabled 
with a config option (default on I think)
# hsync all WAL edits (very expensive, but would not require client changes), 
enabled with a config option (default off)
# sync per Put. Gives control to the application. A batch put would hsync the 
WAL if at least one Put in the batch was market with hsync. What about deletes? 
In 0.94 they are not batched; could it at the end of operation there.
# Per RPC. Could send flag with the RPC from the client. I.e. HTable would have 
a Put(List puts, boolean hsync) method
# HTable.hsync. Client calls this when data must be sync'ed. Most flexible, but 
incurs an extra RPC to the RegionServer just to force the hsync.

Comments welcome.


> Allow proper fsync support for HBase
> 
>
> Key: HBASE-5954
> URL: https://issues.apache.org/jira/browse/HBASE-5954
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 5954-trunk-hdfs-trunk-v2.txt, 
> 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk-v4.txt, 
> 5954-trunk-hdfs-trunk-v5.txt, 5954-trunk-hdfs-trunk-v6.txt, 
> 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Comment Edited] (HBASE-5954) Allow proper fsync support for HBase

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419738#comment-13419738
 ] 

Lars Hofhansl edited comment on HBASE-5954 at 7/21/12 4:01 AM:
---

I think the API going multiple ways (these are not mutually exclusive):

# hsync for HFiles (would guard compactions, etc, very lightweight), enabled 
with a config option (default on I think)
# hsync all WAL edits (very expensive, but would not require client changes), 
enabled with a config option (default off)
# hsync for tables or column families for HFiles (configured in the 
table/column descriptor)
# hsync for tables or column families for the WAL (configured in the 
table/column descriptor)
# WAL hsync per Put. Gives control to the application. A batch put would hsync 
the WAL if at least one Put in the batch was market with hsync. What about 
deletes? In 0.94 they are not batched; could it at the end of operation there.
# WAL hsync per RPC. Could send flag with the RPC from the client. I.e. HTable 
would have a Put(List puts, boolean hsync) method
# HTable.hsync. Client calls this when WAL must be sync'ed. Most flexible, but 
incurs an extra RPC to the RegionServer just to force the hsync.

Comments welcome.

Edit: Forgot some options.

  was (Author: lhofhansl):
I think the API going multiple ways (these are not mutually exclusive):

# hsync for HFiles (would guard compactions, etc, very lightweight), enabled 
with a config option (default on I think)
# hsync all WAL edits (very expensive, but would not require client changes), 
enabled with a config option (default off)
# sync per Put. Gives control to the application. A batch put would hsync the 
WAL if at least one Put in the batch was market with hsync. What about deletes? 
In 0.94 they are not batched; could it at the end of operation there.
# Per RPC. Could send flag with the RPC from the client. I.e. HTable would have 
a Put(List puts, boolean hsync) method
# HTable.hsync. Client calls this when data must be sync'ed. Most flexible, but 
incurs an extra RPC to the RegionServer just to force the hsync.

Comments welcome.

  
> Allow proper fsync support for HBase
> 
>
> Key: HBASE-5954
> URL: https://issues.apache.org/jira/browse/HBASE-5954
> Project: HBase
>  Issue Type: Improvement
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 5954-trunk-hdfs-trunk-v2.txt, 
> 5954-trunk-hdfs-trunk-v3.txt, 5954-trunk-hdfs-trunk-v4.txt, 
> 5954-trunk-hdfs-trunk-v5.txt, 5954-trunk-hdfs-trunk-v6.txt, 
> 5954-trunk-hdfs-trunk.txt, hbase-hdfs-744.txt
>
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-6406) TestReplicationPeer.testResetZooKeeperSession and TestZooKeeper.testClientSessionExpired fail frequently

2012-07-20 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6406:
-

Fix Version/s: (was: 0.94.1)
   0.94.2

> TestReplicationPeer.testResetZooKeeperSession and 
> TestZooKeeper.testClientSessionExpired fail frequently
> 
>
> Key: HBASE-6406
> URL: https://issues.apache.org/jira/browse/HBASE-6406
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.1
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.96.0, 0.94.2
>
> Attachments: 6406.txt, testReplication.jstack, testZooKeeper.jstack
>
>
> Looking back through the 0.94 test runs these two tests accounted for 11 of 
> 34 failed tests.
> They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419746#comment-13419746
 ] 

Zhihong Ted Yu commented on HBASE-5547:
---

I think the latest patch has addressed HFile archival when deleteRegion() is 
called.

Backing up / restoring table can be addressed in HBASE-6055.

> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5547) Don't delete HFiles when in "backup mode"

2012-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419748#comment-13419748
 ] 

Lars Hofhansl commented on HBASE-5547:
--

True, but if dropping a table just drops the latest HFiles to the floor and 
leaves a partial backup around this entire exercise is pointless.

Anyway, from the code in DeleteTableHandler.handleTableOperation it looks like 
all regions are deleted first (using deleteRegion) and then the table directory 
is deleted, so it should be correct. Just making sure here.


> Don't delete HFiles when in "backup mode"
> -
>
> Key: HBASE-5547
> URL: https://issues.apache.org/jira/browse/HBASE-5547
> Project: HBase
>  Issue Type: New Feature
>Reporter: Lars Hofhansl
>Assignee: Jesse Yates
> Fix For: 0.94.2
>
> Attachments: 5547-v12.txt, 5547-v16.txt, hbase-5447-v8.patch, 
> hbase-5447-v8.patch, hbase-5547-v9.patch, java_HBASE-5547_v13.patch, 
> java_HBASE-5547_v14.patch, java_HBASE-5547_v15.patch, 
> java_HBASE-5547_v4.patch, java_HBASE-5547_v5.patch, java_HBASE-5547_v6.patch, 
> java_HBASE-5547_v7.patch
>
>
> This came up in a discussion I had with Stack.
> It would be nice if HBase could be notified that a backup is in progress (via 
> a znode for example) and in that case either:
> 1. rename HFiles to be delete to .bck
> 2. rename the HFiles into a special directory
> 3. rename them to a general trash directory (which would not need to be tied 
> to backup mode).
> That way it should be able to get a consistent backup based on HFiles (HDFS 
> snapshots or hard links would be better options here, but we do not have 
> those).
> #1 makes cleanup a bit harder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira